Path: utzoo!attcan!uunet!husc6!mailrus!ames!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!uiucdcs!uxc.cso.uiuc.edu!urbsdc!aglew
From: aglew@urbsdc.Urbana.Gould.COM
Newsgroups: comp.arch
Subject: Re: benchmarks
Message-ID: <28200147@urbsdc>
Date: 16 May 88 13:28:00 GMT
References: <8734@ames.arc.nasa.gov>
Lines: 44
Nf-ID: #R:ames.arc.nasa.gov:8734:urbsdc:28200147:000:2269
Nf-From: urbsdc.Urbana.Gould.COM!aglew    May 16 08:28:00 1988


>[Eugene Miya]:
>There is a mailing list devoted to performance measurement (@cs.wisc.edu).
>But they are mostly queueing theorists, not benchmarkers.  Largely
>quiet, after all SIGMETRICS'88 is what next week?

Thanks, Eugene... there are a few SIGMETRICS members who realize that it
doesn't do you any good to simulate things with queuing theory or Petri
net models until you can make measurements to calibrate the models.
Trouble is, too many people think of measurement as trivial, a solved
problem.

And by the way: benchmarking is *NOT* the be all and end all of measurement.
Benchmarks must be calibrated just like models must be. Eugene knows this,
but many people act like they don't (Why do you want to measure my system?
Can't you just run benchmarks?)

Computer system performance evaluation should start off with measurement,
with real customers, on real systems, to determine (1) what is important
to your customers, and (2) what they actually do with the system. (2) can
influence (1), as in "look, you say that floating point speed is the most
important thing to you, but you spend 90% of your time doing integer work",
but not always "we do integer work to fill in any slack time. a real 
application has 90% slack time, but those 10% are on time critical paths".
"Oh".

Once you have measurements, they may be abstracted into benchmarks.
Benchmarks can be used to drive simulations, or to evaluate a new system.
Benchmarks can be used to make far more complicated "pseudo-measurements",
because you can warp the time scale to use time costly instrumentation.

But, just as simulations without realistic workloads are useless, so are
benchmarks without an underlying rationale based on measurement. (Actually,
change "useless" to "not so useful" - sometimes they are better than nothing).

Trouble is, most texts and papers on performamnce evaluation assume that you
have got the measurements - given a spectrum of points in your measurement
space, here is how you produce a reasonably good workload sample. That's
computer science.
    Me, I want to be a "computer naturalist" - I want to develop measurement
techniques that can be applied easily, cheaply, to a variety of systems
and workloads, over extended periods of time.

See you in Santa Fe...