Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!sdd.hp.com!hplabs!hpda!hpcuhc!spuhler
From: spuhler@hpcuhc.cup.hp.com (Tom Spuhler)
Newsgroups: comp.benchmarks
Subject: Re: Re: unbc - A New!, Improved! bc benchmark (nope)
Message-ID: <115440004@hpcuhc.cup.hp.com>
Date: 20 Dec 90 01:28:46 GMT
References: <7710@eos.arc.nasa.gov>
Organization: GSY Systems Performance Section
Lines: 79

# >for your faster CPU's?  Does management want a richer instruction mix to
# 
# Er, sorry, I must be dense, but where does the "richer instruction mix"(tm)
# come in (sounds like coffee, thank god I drink tea).  Seems like more of

Come on, Eugene, you're tripping over the easy ones:-)  Richer
instruction mix means a more varied, or using a larger subset of the machine 
insructions.  Not particullary interesting, as the important criteria is
how the tested instruction mix matches your expected workloads(for richer
or poorer:-) but, I get more warm fuzzies from tests that exercise the
'richer' mixes then the 'poorers' as real life usage tends to be on the
richer side (for the kinds of computers I'm interested in).  Was common
terminology around here.  I didn't invent it  (now, as to the concept of
"creamier" code, I'll take some blame on that).

# the same.  Do you work per change in a marketing department?  Longer running?
# Longer is not necessarily better (no sex jokes please).  Seems this could

Sorry, no, to the marketing question.  Longer is better in that it tends
to minimize the lack of precision of the reporting mechanism (in this
case /bin/time) and the impact of startup effects (something of conern
in the 'bc' benchmark) will be minimized.  When the run times drop below
a couple of seconds, I personally start to worry about the precision of
/bin/time.  I like um to run at least 10 seconds.  Unfortunately, I
didn't achieve that goal with 2^9999/3^6308.  On some systems, I expect
it can run in less then a second, but I was limited by the 'bc' program
and my interests in simplicity.  Longer is not 'necessarily' better, but I find
it usually is for accuracy in results, although 'longer' may reduce the
number or times it's run or it's usefulness, which may be more
important.

# Longer is not necessarily better (no sex jokes please).  Seems this could
# be optimized as well.  Fortunately (?) I didn't see the beginnings of the 

Optimizable?  Oh sure.  This is always true.  Vendors could hard code in
the answer.  It's a question of ease, likelyhood, and dependence.
How hard is it to optimize for this case?  2^9999/3^6308 is
harder to optimize for then 2^5000/2^5000, assuming for more then just
the hard-coded case (easy to detect) and somewhat consistent with
the intent of 'bc'.  How likely is someone likely to do something like
that?  Depends on how hung up the world gets on a single benchmark.
How likely is someone going to optimize for Dhrystone? (Seems to have
hppened).  It's all a matter of contest.
  
# >	It is better to have some data, no matter how limited, as long
# >	as you understand it, then no data at all.
# 
# Nope.  Beg to disagree.  It can be more damaging.  I think some one is suing
# someone else over performance claims, getting nasty.  
# Note: in a first post, I cited the APL benchmark (Gaussian sum) where
# the adds were all replaced by the simple (n+1)n/2 formula (n was = 256).
# 
We always have to live with imperfect information.  True, the results of
a benchmark running your applications(s) on a variety of vendor machines
with a variety of configurations is ideal, it can be a little expensive
to achieve.   Something like the bc or nbc benchmarks may be not
very good, but they are cheap to run.  Results from a good number of
machines are available.  Note that the results of both efforts may be no
more useful (or less useful) to someone else in determining the relative
performances of the tested boxes.  And guess which one cost less.
Using bc, or better nbc can help classify systems and direct other
investigative efforts.  The combination of bc and nbc results is
considerably more useful then either one alone.  Keep adding in more
benchmarks and you can develop a performance profile of a system.  Does
SPEC alone allow one to characterize the performance of a system?
Definately not.  Does it help?  Sure.  How about TPC-A?  For any single
characterization, one can cite exceptions.  Only the complete universe
of information is universally useful.  Performance information is
damaging only if it is missued (happends a lot).  
["there is no enlightenment until there is total enlightenment"].  

# It's hard to understand the behavior of some benchmark results, even by
# some of the programmer who wrote a given benchmark or compiler.

and it's even harder to come up with a single all singing and dancing
benchmark which will allow anyone to evalute the performance of a
variety of boxes running whatever applications they choose.  

-Tom Spuhler,  Spuhler@cup.hp.com