Path: utzoo!attcan!uunet!husc6!mailrus!ames!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!uiucdcs!uxc.cso.uiuc.edu!urbsdc!aglew From: aglew@urbsdc.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: benchmarks Message-ID: <28200147@urbsdc> Date: 16 May 88 13:28:00 GMT References: <8734@ames.arc.nasa.gov> Lines: 44 Nf-ID: #R:ames.arc.nasa.gov:8734:urbsdc:28200147:000:2269 Nf-From: urbsdc.Urbana.Gould.COM!aglew May 16 08:28:00 1988 >[Eugene Miya]: >There is a mailing list devoted to performance measurement (@cs.wisc.edu). >But they are mostly queueing theorists, not benchmarkers. Largely >quiet, after all SIGMETRICS'88 is what next week? Thanks, Eugene... there are a few SIGMETRICS members who realize that it doesn't do you any good to simulate things with queuing theory or Petri net models until you can make measurements to calibrate the models. Trouble is, too many people think of measurement as trivial, a solved problem. And by the way: benchmarking is *NOT* the be all and end all of measurement. Benchmarks must be calibrated just like models must be. Eugene knows this, but many people act like they don't (Why do you want to measure my system? Can't you just run benchmarks?) Computer system performance evaluation should start off with measurement, with real customers, on real systems, to determine (1) what is important to your customers, and (2) what they actually do with the system. (2) can influence (1), as in "look, you say that floating point speed is the most important thing to you, but you spend 90% of your time doing integer work", but not always "we do integer work to fill in any slack time. a real application has 90% slack time, but those 10% are on time critical paths". "Oh". Once you have measurements, they may be abstracted into benchmarks. Benchmarks can be used to drive simulations, or to evaluate a new system. Benchmarks can be used to make far more complicated "pseudo-measurements", because you can warp the time scale to use time costly instrumentation. But, just as simulations without realistic workloads are useless, so are benchmarks without an underlying rationale based on measurement. (Actually, change "useless" to "not so useful" - sometimes they are better than nothing). Trouble is, most texts and papers on performamnce evaluation assume that you have got the measurements - given a spectrum of points in your measurement space, here is how you produce a reasonably good workload sample. That's computer science. Me, I want to be a "computer naturalist" - I want to develop measurement techniques that can be applied easily, cheaply, to a variety of systems and workloads, over extended periods of time. See you in Santa Fe...