Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!gatech!utkcs2!de5 From: de5@ornl.gov (Dave Sill) Newsgroups: comp.benchmarks Subject: Re: benchmark evaluations (was Re: expanded bc results) Message-ID: <1990Dec12.135910.27667@cs.utk.edu> Date: 12 Dec 90 13:59:10 GMT References: <12220@hubcap.clemson.edu> Sender: news@cs.utk.edu (USENET News System) Reply-To: Dave Sill Organization: Oak Ridge National Laboratory Lines: 61 In article <12220@hubcap.clemson.edu>, mark@hubcap.clemson.edu (Mark Smotherman) writes: > >I teach students (for better or worse) that benchmarks should be: > >1) Representative > A) accurate characterization of workload > B) exploit system structure (including compiler optimization) only > as much as workload will be able to do so Only important if the results are going to be used to predict the performance of the system on other code. >2) Reproducible > A) full system configuration (including OS and compiler versions) > specified (e.g., SPEC reporting) > B) system unloaded, or load specified and reproducible > C) operational rules (e.g., compiler options, program inputs and files) Not necessary in all cases, e.g., informal testing or repeated tests of the same configuration. >3) Compact > A) portable, with little or no need of conversion > B) inexpensive > C) test files (to avoid size and privacy problems with actual data) The bc test is certainly portable. >In trying to evaluate the bc test (i.e., echo 2^5000/2^5000 | /bin/time bc) >according to these criteria, I am deeply disturbed by the continuing >promotion of "bc" in the face of evidence that the benchmarks being run >on the different systems are not identical. You're assuming that the bc test is used to evaluate the performance of unlike systems. This is clearly not a valid use of the test. >IMHO, "bc" only has compactness on its side, with representativeness >questionable and reproducibility totally ruled out. Why, then, is there >continuing interest? Because folks use different tools for different jobs, as I've said many times before. The bc test is trivial, and is not intended to replace full, rigorous suites such as SPEC. >Are we in fact setting up comp.benchmarks for a >place in Hennessy and Patterson's "benchmarking hall of shame" for their >second edition? Huh? >On a second thread, what would you add (or subtract) from the criteria >given above? I'd try to relate various sets of criteria with the different tasks benchmarks are used for. There's no "one size fits all" set of criteria. -- Dave Sill (de5@ornl.gov) Martin Marietta Energy Systems Workstation Support