Path: utzoo!attcan!uunet!bellcore!att!emory!hubcap!mark
From: mark@hubcap.clemson.edu (Mark Smotherman)
Newsgroups: comp.benchmarks
Subject: benchmark evaluations (was Re: expanded bc results)
Message-ID: <12220@hubcap.clemson.edu>
Date: 12 Dec 90 05:36:15 GMT
Organization: Clemson University, Clemson, SC
Lines: 53


A followup on the old bc thread, and possibly the start of a new thread --


I teach students (for better or worse) that benchmarks should be:

1) Representative
   A) accurate characterization of workload
   B) exploit system structure (including compiler optimization) only
      as much as workload will be able to do so
2) Reproducible
   A) full system configuration (including OS and compiler versions)
      specified (e.g., SPEC reporting)
   B) system unloaded, or load specified and reproducible
   C) operational rules (e.g., compiler options, program inputs and files)
3) Compact
   A) portable, with little or no need of conversion
   B) inexpensive
   C) test files (to avoid size and privacy problems with actual data)


In trying to evaluate the bc test (i.e., echo 2^5000/2^5000 | /bin/time bc)
according to these criteria, I am deeply disturbed by the continuing
promotion of "bc" in the face of evidence that the benchmarks being run
on the different systems are not identical.  That is, the following posts
have seemingly had little impact:

amos@taux01.nsc.com (Amos Shapir) writes:
| Besides, there are several versions of "bc" (some of which do not fork "dc")
| and since the original version of "dc" was rather buggy, several versions
| of it too, some of which are major rewrites.
| The bottom line is: comparing "bc" runs on different systems is necessarily
| comparing apples and oranges (or at least plums & prunes) unless you're
| sure you have the same version of "bc", "dc", and UNIX.

ciemo@bananapc.wpd.sgi.com (Dave Ciemiewicz) writes:
| ... I just diff'ed the sources
| between BSD and SYSV versions of dc which is the compute engine for bc.
| There are changes to the SYSV version for robustness that may sway results
| one way or the other.

IMHO, "bc" only has compactness on its side, with representativeness
questionable and reproducibility totally ruled out.  Why, then, is there
continuing interest?  Are we in fact setting up comp.benchmarks for a
place in Hennessy and Patterson's "benchmarking hall of shame" for their
second edition?


On a second thread, what would you add (or subtract) from the criteria
given above?
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark