Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!gatech!lll-lcc!ames!pioneer!eugene
From: eugene@pioneer.UUCP
Newsgroups: comp.arch
Subject: Re: Dhrystones (sorry longer)
Message-ID: <342@ames.UUCP>
Date: Wed, 18-Feb-87 17:39:40 EST
Article-I.D.: ames.342
Posted: Wed Feb 18 17:39:40 1987
Date-Received: Fri, 20-Feb-87 01:00:41 EST
Sender: usenet@ames.UUCP
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 102
Keywords: performance measurement, benchmarking

From: Dick Dunn <ames!ico.ISC.COM!rcd>
Message-Id: <8702180627.AA05785@violet.ISC.COM>

I really enjoyed this message from Dick Dunn.  He raises good issues.
I have edited it bit.

>The Dhrystone benchmark has done something positive for us:  It's a
>benchmark that tells us something about the work that a lot of us do.  From
>what I remember of where you are and what you do (gathered from 1/86
>USENIX), you're probably among a group of card-carrying number crunchers.
>For my part, I count the interval between my forays into numeric work in
>days or weeks--and even that is not <heavy> numeric work, just the
>existence of a %...f in a program I run!  There, Dhrystone gives a measure
>that doesn't involve floating point at all, which is nice.
>
>Another nice thing about Dhrystone is that, like Whetstone, it gives us a
>measure of a complete system--CPU, bus, memory/cache, compiler, and even OS
>to the extent that it might intrude (hopefully little).  It's a lot better
>than just raw CPU measurements.

That the Dhrystone does something positive is a false sense of security.
The Whetstone is very little better.  No, I am not a full time card
carrying number cruncher, in fact, Crays and other big machines do not
spend most of their time crunching numbers, they spend it fetching from
memory like every-one else (only more efficiently: chain, etc.)
There is a myth which Dick propagates below when he mentions "smaller
machines."  Most architectures differ little from the "von Neumann."
A Cray differs little architecturally from a 6502.  Unique architectures
include: the ICL DAP, the Goodyear MPP, the STARAN, the new Multiflow,
Hypercubes (too many), the ILLIAC IV, Dennis and Arvind dataflow
proposals, the Manchester and the Japanese and TI dataflow machines, the
Alliant, Cedar (not the Xerox system, but U. Ill.), the IBM RP3, the
Ultracomputer, C.mmp, Cm*, and numerous other projects.

The issue of just what Weicker's program measures is controversial.
It is certainly not separable so I think hope is in vain.  Interesting
you did not mention the next paragraph as part of this.  Removing
floating does not help the problem: you should be able to similarly
separate CPU from bus, from cache, from memory, and so forth, put it all
back together and you have the System, or do you?

>The downfall of Dhrystone--and most other simple benchmarks--is that it
>attempts to reduce the performance of a system to a single number.
> . . More elaborting failure of Single figure of merit (SFM)...

No disagreement here.

>I think the discussion about the effects of paging (in the articles you and
>I wrote) pretty well indicates the problem--do we twiddle Dhrystone so that
>it <does> reflect the cost of paging as much as possible, or do we arrange
>it so that it shows the paging as little as possible?  The paradox (your
>view of it) arises because we're trying to use one number to describe a
>position in two-space (where the coordinates are raw cpu performance and
>real performance under nonzero paging load).
>
>For my part, I'd just as soon toss in some articles to confuse the issue
>whenever people try to develop a single omniscient metric for things that
>are so obviously multi-valued.  If they stay confused long enough, maybe
>they'll start to think, just out of desperation.  (I know it's a lot to
>hope for!)  Paging is a good example for now, but there are quite a few
>other aspects of performance.

I think we should measure something (not Dhrystones) in a broad, yet
consistent manner.  It is my thesis, and I am touring important sites
around the country trying to drum up support, that we have to test
performance like we do functional testing.  Consider that compiler
validation suites have hundreds of systematic test programs.  True they
are not perfect, but I think they give a better picture of how a
compiler works than single programs.  We must do the same for
performance.  The marketing types might fear some degree of consistency,
but the EPA ratings did not destroy the auto market (they are two figure
of merit).  Everybody has their own application.  With a mass of
numbers, we will separate out those who are really serious about
understanding just what makes a machine run.  The smaller machines you
will allude to (next) can certainly hock their machines based on some
small set of criteria.  They are not trying to compete with more
expensive machines.  I am only asking for a consistent and systematic
form of measurement like a Metric, a Second, a Kilogram.  Not a 3 accord
inch.  P.S. paging is only one aspect I used as an example.

>I'd particularly like to be able to create grief for smaller machines which
					     ^^ like your use of the word
>have particular benchmark-oriented features.  The 286 systems are the best
>example--you get to choose one of perhaps six different "computational
>models" depending on "how fast you want the program to run" versus "how
>useful you want it to be".  We need benchmarks which force such machines to
>be tested on real problems--and Dhrystone is too tiny to help there.

We are working on this latter.  The real tough nut is: what is a real
problem?  How do you characterize them?  What about the COBOL, FORTRAN,
C, LISP, Prolog and other pieces of code out there which differ lots:
how do you compare them?  [Naw that's written in a language which means
nothing to me...]

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene