Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!gatech!utkcs2!de5
From: de5@ornl.gov (Dave Sill)
Newsgroups: comp.benchmarks
Subject: Re: benchmark evaluations (was Re: expanded bc results)
Message-ID: <1990Dec12.135910.27667@cs.utk.edu>
Date: 12 Dec 90 13:59:10 GMT
References: <12220@hubcap.clemson.edu>
Sender: news@cs.utk.edu (USENET News System)
Reply-To: Dave Sill <de5@ornl.gov>
Organization: Oak Ridge National Laboratory
Lines: 61

In article <12220@hubcap.clemson.edu>, mark@hubcap.clemson.edu (Mark Smotherman) writes:
>
>I teach students (for better or worse) that benchmarks should be:
>
>1) Representative
>   A) accurate characterization of workload
>   B) exploit system structure (including compiler optimization) only
>      as much as workload will be able to do so

Only important if the results are going to be used to predict the
performance of the system on other code.

>2) Reproducible
>   A) full system configuration (including OS and compiler versions)
>      specified (e.g., SPEC reporting)
>   B) system unloaded, or load specified and reproducible
>   C) operational rules (e.g., compiler options, program inputs and files)

Not necessary in all cases, e.g., informal testing or repeated tests
of the same configuration.

>3) Compact
>   A) portable, with little or no need of conversion
>   B) inexpensive
>   C) test files (to avoid size and privacy problems with actual data)

The bc test is certainly portable.

>In trying to evaluate the bc test (i.e., echo 2^5000/2^5000 | /bin/time bc)
>according to these criteria, I am deeply disturbed by the continuing
>promotion of "bc" in the face of evidence that the benchmarks being run
>on the different systems are not identical.

You're assuming that the bc test is used to evaluate the performance
of unlike systems.  This is clearly not a valid use of the test.

>IMHO, "bc" only has compactness on its side, with representativeness
>questionable and reproducibility totally ruled out.  Why, then, is there
>continuing interest?

Because folks use different tools for different jobs, as I've said
many times before.  The bc test is trivial, and is not intended to
replace full, rigorous suites such as SPEC.

>Are we in fact setting up comp.benchmarks for a
>place in Hennessy and Patterson's "benchmarking hall of shame" for their
>second edition?

Huh?

>On a second thread, what would you add (or subtract) from the criteria
>given above?

I'd try to relate various sets of criteria with the different tasks
benchmarks are used for.  There's no "one size fits all" set of
criteria.

-- 
Dave Sill (de5@ornl.gov)
Martin Marietta Energy Systems
Workstation Support