Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!sol.ctr.columbia.edu!emory!utkcs2!de5 From: de5@ornl.gov (Dave Sill) Newsgroups: comp.benchmarks Subject: Re: Don't use bc (was: More issues of benchmarking) Message-ID: <1990Dec6.132203.2341@cs.utk.edu> Date: 6 Dec 90 13:22:03 GMT References: <1990Dec3.191756.15280@cs.utk.edu> <39871@ucbvax.BERKELEY.EDU> <1990Dec3.204027.16794@cs.utk.edu> <109872@convex.convex.com> Sender: news@cs.utk.edu (USENET News System) Reply-To: Dave Sill Organization: Oak Ridge National Laboratory Lines: 111 In article <109872@convex.convex.com>, patrick@convex.COM (Patrick F. McGehearty) writes: > >I suggest that the bc benchmark is worse than worthless for several reasons. > >First, as pointed out, it is not measuring the raw add/multiply rate >of the machine. Thus it is worthless for measuring raw arithmetic speed. Granted. >It measures the "multi-precision arithmetic" >capabilities as implemented by dc, which is mostly subroutine call/returns. So how does that make it worthless? E.g., assume I want to measure an inexact mix of arithmetic, function calls, context switches, etc. Does the bc test not depend on these types of tasks? >Further, I have never seen a system where bc/dc is a significant user of >cycles. Thus, the less than expert user will believe the measurements >represent something different from reality. I don't see how you got from the premise to your conclusion. As for the "less that expert user", if we fall into the trap of making twit-proof benchmarks and outlawing anything that's not obvious and general, we're not going to get anywhere. >Second, for most machines, little architecture or compiler work has >been (or should be) done to optimize this application. Lack of optimization does not invalidate a benchmark. >So you will not be >able to tell the difference between those machines which have features >useful to your application and those which do not. The bc test shouldn't be used to attempt to predict the performance of one's application unless one has specifically determined that such a comparison is valid. Granted. >Third, widespread reporting of such a benchmark will encourage other, >less knowledgable buyers to read more into the numbers than should be >read. The twit-proof trap, again. In this case, though, it's not really a big issue because we're not proposing the bc test be added to the SPEC suite. It's a limited-usefulness, extremely trivial benchmark. >Fourth, if buyers use the benchmark, then vendors will be encouraged >to put resources into enhancing their performance on it instead of >enhancing something useful. This is a bad thing and the primary reason >why I am posting. Bad benchmarks lead to lots of wasted effort. Oh come on! This is the twit-proof argument again. If someone's stupid enough to use bc to specify minimum performance in a procurement, then they absolutely deserve what they get. I'm certainly not going to lose any sleep over the possibility. >I use the Whetstone benchmark as a "proof by example". It seems like you're using it a proof by counterexample. > : >In these cases, the development efforts were not totally wasted. Efforts to >speed up the transcendental functions (SIN, COS, etc) used in the Whetstones >helped those applications which used the transcendentals. I see no value >to most users of general purpose computing (scientific or business) in >optimizing bc/dc. No vendor I know is stupid enough to devote their resources to optimizing bc just because a handful of people use it as a trivial benchmark. Even if it happened, though, what harm would there be in that? >Many procurements require some minimum rate on a some well-known benchmark >for a vendor to be even allowed to bid. If you can't make this number, you >don't get a chance to show how good your architecture and compilers are for >executing the customer's real application. There are even a significant >number of customers who do not run benchmarks before purchase. They >just rely on quoted numbers for well-known benchmarks. It is our duty >as responsible professionals to develop and measure benchmarks that mean >something and which explain what they mean. Exactly. This is what the major commercial suites are for. As I've already said, though, one doesn't want to have to carry a SPEC tape with them wherever they go. It's not always feasible to run a major, rigorous suite, and in many cases it's overkill. For example, and I've already pointed out this use of the bc test, let's say I'm working at my DECstation one day and it seems to be sluggish. Is it just my imagination, or is it really slower? Should I devote a day or so to running the SPEC suite just to find out, or should I type "echo 2^5000/2^5000 | /bin/time bc" and compare it to previous runs on the same machine or to a similarly configured system? >If you really must have a "quick and dirty" benchmark, how about the >following: > > [FORTRAN program deleted] Some of my systems don't have FORTRAN compilers. It's too long to easily remember and type in. It doesn't do a mix of function calls, arithmetic, forks, context switches, etc. As I've already said: benchmarks are tools. They come in all sizes, perform many different tasks, vary greatly in quality, etc. The bc benchmark doesn't do everything, and, like any tool, can be used for things it's not intended to be used for, but it *does* have its niche. -- Dave Sill (de5@ornl.gov) Martin Marietta Energy Systems Workstation Support Brought to you by Super Global Mega Corp .com