Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!gatech!lll-lcc!ames!pioneer!eugene From: eugene@pioneer.UUCP Newsgroups: comp.arch Subject: Re: Dhrystones (sorry longer) Message-ID: <342@ames.UUCP> Date: Wed, 18-Feb-87 17:39:40 EST Article-I.D.: ames.342 Posted: Wed Feb 18 17:39:40 1987 Date-Received: Fri, 20-Feb-87 01:00:41 EST Sender: usenet@ames.UUCP Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 102 Keywords: performance measurement, benchmarking From: Dick Dunn Message-Id: <8702180627.AA05785@violet.ISC.COM> I really enjoyed this message from Dick Dunn. He raises good issues. I have edited it bit. >The Dhrystone benchmark has done something positive for us: It's a >benchmark that tells us something about the work that a lot of us do. From >what I remember of where you are and what you do (gathered from 1/86 >USENIX), you're probably among a group of card-carrying number crunchers. >For my part, I count the interval between my forays into numeric work in >days or weeks--and even that is not numeric work, just the >existence of a %...f in a program I run! There, Dhrystone gives a measure >that doesn't involve floating point at all, which is nice. > >Another nice thing about Dhrystone is that, like Whetstone, it gives us a >measure of a complete system--CPU, bus, memory/cache, compiler, and even OS >to the extent that it might intrude (hopefully little). It's a lot better >than just raw CPU measurements. That the Dhrystone does something positive is a false sense of security. The Whetstone is very little better. No, I am not a full time card carrying number cruncher, in fact, Crays and other big machines do not spend most of their time crunching numbers, they spend it fetching from memory like every-one else (only more efficiently: chain, etc.) There is a myth which Dick propagates below when he mentions "smaller machines." Most architectures differ little from the "von Neumann." A Cray differs little architecturally from a 6502. Unique architectures include: the ICL DAP, the Goodyear MPP, the STARAN, the new Multiflow, Hypercubes (too many), the ILLIAC IV, Dennis and Arvind dataflow proposals, the Manchester and the Japanese and TI dataflow machines, the Alliant, Cedar (not the Xerox system, but U. Ill.), the IBM RP3, the Ultracomputer, C.mmp, Cm*, and numerous other projects. The issue of just what Weicker's program measures is controversial. It is certainly not separable so I think hope is in vain. Interesting you did not mention the next paragraph as part of this. Removing floating does not help the problem: you should be able to similarly separate CPU from bus, from cache, from memory, and so forth, put it all back together and you have the System, or do you? >The downfall of Dhrystone--and most other simple benchmarks--is that it >attempts to reduce the performance of a system to a single number. > . . More elaborting failure of Single figure of merit (SFM)... No disagreement here. >I think the discussion about the effects of paging (in the articles you and >I wrote) pretty well indicates the problem--do we twiddle Dhrystone so that >it reflect the cost of paging as much as possible, or do we arrange >it so that it shows the paging as little as possible? The paradox (your >view of it) arises because we're trying to use one number to describe a >position in two-space (where the coordinates are raw cpu performance and >real performance under nonzero paging load). > >For my part, I'd just as soon toss in some articles to confuse the issue >whenever people try to develop a single omniscient metric for things that >are so obviously multi-valued. If they stay confused long enough, maybe >they'll start to think, just out of desperation. (I know it's a lot to >hope for!) Paging is a good example for now, but there are quite a few >other aspects of performance. I think we should measure something (not Dhrystones) in a broad, yet consistent manner. It is my thesis, and I am touring important sites around the country trying to drum up support, that we have to test performance like we do functional testing. Consider that compiler validation suites have hundreds of systematic test programs. True they are not perfect, but I think they give a better picture of how a compiler works than single programs. We must do the same for performance. The marketing types might fear some degree of consistency, but the EPA ratings did not destroy the auto market (they are two figure of merit). Everybody has their own application. With a mass of numbers, we will separate out those who are really serious about understanding just what makes a machine run. The smaller machines you will allude to (next) can certainly hock their machines based on some small set of criteria. They are not trying to compete with more expensive machines. I am only asking for a consistent and systematic form of measurement like a Metric, a Second, a Kilogram. Not a 3 accord inch. P.S. paging is only one aspect I used as an example. >I'd particularly like to be able to create grief for smaller machines which ^^ like your use of the word >have particular benchmark-oriented features. The 286 systems are the best >example--you get to choose one of perhaps six different "computational >models" depending on "how fast you want the program to run" versus "how >useful you want it to be". We need benchmarks which force such machines to >be tested on real problems--and Dhrystone is too tiny to help there. We are working on this latter. The real tough nut is: what is a real problem? How do you characterize them? What about the COBOL, FORTRAN, C, LISP, Prolog and other pieces of code out there which differ lots: how do you compare them? [Naw that's written in a language which means nothing to me...] From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene