Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!usc!apple!agate!shelby!eos!eugene From: eugene@eos.arc.nasa.gov (Eugene Miya) Newsgroups: comp.benchmarks Subject: Preliminary Miya babblings on benchmarking Message-ID: <7590@eos.arc.nasa.gov> Date: 15 Nov 90 07:40:34 GMT Reply-To: eugene@eos.UUCP (Eugene Miya) Organization: NASA Ames Research Center, Calif. Lines: 78 While attending SPEC it was very clear to me 1) we (all benchmarkers) are great philosophizers, no, talkers, well, something less than do'ers. This is a hunch; my opinion can change at any time. 2) None of us really knows what we are doing, but it is very clear to me that measurement is part of a process of consensus. I can't come up with a solution unless you agree. A prof at UCB's history department enlightening me about the story of the Meter 2 years ago). We are blind philosophers trying to count horse's teeth. Basic problems as I see them. I have some of this stuff written down an really elaborated, but I've never finished the paper (4+ years). I. The tension of simplicity. We need things to be simple: we need to port codes, we need to understand what we are tesing, and we need to be able to comprehend the results {and make use of them}. Every one want's to predict, but our methods of description are poor. No glory in descriptions. We seek linear models. That's a real problem. We see the subproblems of portability, statistics, social consequences. I stop here. [Portability is amusing, I've tried to collect the world's smallest benchmarks in a few cases: APL, dc, and others, hey, I'm a theoretical mathematician by training, if a benchmark exists, there must exist a smallest one. If a smallest exists, then.....I'll talk about it later.] II. The problem of equivalence. "How do you compare apples to oranges (Bananas or Crays[tm])?" I think it is possible, but I note two problems: 1) citing the A to O comparison is a great way to kill a discussion. The biologists got beyond that point. 2) If you get to an "apples to Apples" comparison, I find people argue "Macintoshes can't be compared to golden delicious" and you are back to where you started. Subproblems: what's a 'real' program? John Hennessy brought this up. The problem is NOT what's "real" the problem is what's "representative?" A benchmark is a surrogate. You typically don't want the "real program." Takes too long to port, to run, etc. (I). Also here are the issues of "repetition" and "reproducibility." Two subtle issues. "Optimization" is another issue. I think we need to develop the distinction of actual versus virtual work. Can of worms. PERFECT Club knows this. David Kuck (UIUC) taped in Reno. I jokingly call my stuff "