Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!lll-winken!uunet!convex!rosenkra
From: rosenkra@convex.com (William Rosencranz)
Newsgroups: comp.benchmarks
Subject: Re: benchmarks (SPECmarks)
Summary: why all the fuss?
Keywords: validity meaning
Message-ID: <108988@convex.convex.com>
Date: 20 Nov 90 00:43:04 GMT
References: <7581@eos.arc.nasa.gov> <1146@dg.dg.com> <7589@eos.arc.nasa.gov> <1148@dg.dg.com>
Sender: news@convex.com
Organization: Convex Computer Corporation; Richardson, TX
Lines: 99


---
i dunno, maybe i am just daft, so ignore this if you beg to differ. it
is not meant to offend, so if you read something into it, pls reread.
it is also my opinion, not that of my employer...

i have been reading this newsgroup for a week or so, and SPECmark is
the current hot topic. i am a bit confused over some of the issues
raised, so maybe i'll raise some of my own.

first off: what are SPEC ratings (or any standard bm ratings for that
matter) meant to do? answer this question in your mind first before
proceeding...

i really see no point whatsoever in relating an execution time on one
machine to that of another "standard" machine, no matter how standard,
(except possibly the old "that's the way we've ALWAYS done it before",
e.g. "MIPS"), just to come up with some single "standard" unit of
performance.

if I were buying (instead of selling :-), i'd want to see wallclock and
cpu times, because i, as a human being, can relate to time far easier
the "SPECs" or whatever. if something runs in 10 seconds, compared to
100 seconds, i know i can sit and wait, call it "interactive". if
something runs in 10 min vs 1 hour, i know i can go out to lunch in the
latter case. a SPEC of 1.345 vs a SPEC of 4.345 means nothing, until
i translate to time anyway. time is easier to "heft", as it were.

further, i'd want to see how the "standard" bm results scale with
problem size, especially on cache-based memory systems. because a
buy decision based on a single number could come back to haunt me.

i'd also want to know what sort of performance enhancements i could
expect if i wanted to put 1 hour, 1 day, and 1 week's effort into
the optimization of any particular code, if possible.

i'd also want to compare a vendor's peak performance with how well
it did on standard bm's or on my own.

finally, i'd want to see what sort of support i can expect from the
vendor. granted, pre-sales and post-sales activities can vary
greatly, but i think i can shake out a vendor during the sales
cycle, as most saavy buyers can.

why the need for complication, other than perhaps marketing fog? and
believe me, if i see 2 or 3 systems with uni-number ratings within
say 5% of each other, i sure as heck would not say "these machines
are identical, so let's buy the cheaper one becasue it has better
price/SPECperformance". i'd want to look at the raw data anyway, and
probably run my sort of workload on them to really get an idea of
what i can expect. similarly, if i see two machines that differ by
alot in some particular individual tests, i' want to know why.

in fact, unless i expect to buy a machine to do just one job (or one
job at a time), i would more than likely ignore these uni-job ratings
altogether, since, from my experience, in "real life", multi-job
thruput is where productivity gains are made, and is where strengths
and weaknesses in architectures (e.g. cache vs widely interleaved memory)
are really determined anyway (in many, if not most cases). probably
without exception, the SPEC'd machines are general purpose systems,
especially workstations, which would get lots of differnt tasks from
text processing to dbms to finite element analysis to ...

the basic problem i see with these uni-number ratings is that
people can make up their minds, even subconsciously, based on a
first impression. this is human nature. you always have that in
the back of your mind. and it is easy to just say "2 > 1.5"
rather than "based on some real workload, and on problem size,
and on vendor support, and on application availability, and on
whatever, 2 is not necessarily > 1.5".

distilling machine performance down to one number tends to make it
easy to abuse it, to misrepresent it. if in fact these sorts of
performance quotients are (good faith?) attempts to enlighten,
then why not enlighten thru education rather than simplification?
surely we can give more credit to the intellect of people making
buy decisions than that?

why not a "SPECparagrah" that sheds more light? consider this my
entry in the standard bm sweepstakes :-).

please don't argue the merits of standards. i am well aware of the
risks an benefits therein. i also know that shopping for supercomputers
is different that shopping for workstations, though in my mind buying
100 w/s at $20k a pop is still spending $2M and it might be better to
buy 100 w/s at $10k and a central system at $1M with my $2M. the SPEC
numbers in no way help me here, i think. having spent the last 15 years
dealing with supercomputers, and only 5 or 6 with workstations and pc's,
i am somewhat biased, i suppose, though i like to at least think i have
an open mind about these sorts of issues.

personally, i think i'll wait for the SPECthroughput bm...

-bill rosenkranz
rosenkra@convex.com

--
Bill Rosenkranz            |UUCP: {uunet,texsun}!convex!c1yankee!rosenkra
Convex Computer Corp.      |ARPA: rosenkra%c1yankee@convex.com