Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!seismo!lll-lcc!pyramid!prls!mips!mash From: mash@mips.UUCP Newsgroups: comp.arch,comp.sys.misc Subject: Re: 01/31/87 Dhrystone Results and Source Message-ID: <112@winchester.mips.UUCP> Date: Sun, 8-Feb-87 19:16:24 EST Article-I.D.: winchest.112 Posted: Sun Feb 8 19:16:24 1987 Date-Received: Tue, 10-Feb-87 02:17:43 EST References: <2348@homxb.UUCP> <15203@onfcanim.UUCP> <293@ames.UUCP> <2366@homxb.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 81 Keywords: Benchmark, C, performance measurement Xref: watmath comp.arch:308 comp.sys.misc:331 In article <2366@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes: ...TIME, TIMES, Dhrystone accuracy, etc.... > >Besides, anybody who quibbles over a 10% difference isn't looking >at the whole picture when selecting a machine. Dhrystones just >get you looking at the right performance arena. Other factors >(software, support, migration path, etc.) will get you to the >final decision. 1) This seems like a reasonable answer, which is in agreement with Gene Miya's adverse comments on ANY single figure of merit. [That's why we end up publishing a Performance Brief that's pretty large, to include enough different benchmarks and explanations to have even a chance of meaning anything.] 2) I'd generally consider Dhrystone to have about a single digit of accuracy. In particular, there are all sorts of anomalies with regard to heavily-cached machines, and the rules regarding allowable optimizations. For example, on "realistic" integer benchmarks, our "5MIPS" M/500s are about 20% faster than a "4MIPS" VAX 8600, and many hours of working on both says this is consistent, although the Dhrystone numbers would claim the M/500 (somewhere in the 10-12,000 range, depending on exactly what optimizations are/aren't allowed for consistency) is 1.7X to 1.9X the 8600's 6000-7000. I'd strongly expect that Dhrystone often overstates the performance of some micros relative to superminis: for example, Intergraph's recently published numbers for it's 30Mhz Clipper box include about 8300 Dhrystones, making it look faster than the 8600, even though it was somewhat/substantially slower on any of the more realistic benchmarks that they showed. 3) As Rick has always maintained, things like Dhrystone [or Whetstone, or Linpack, or...] are just a starting point. However, as it stands, Dhrystone is the only integer benchmark of any size whatsoever [since things like Ackerman's function, sieve, etc are way too small to be good predictors of anything] that is commonly cited and collected. Maybe it's time to either create a Dhrystone 2.0, or some other integer benchmark to get at least 1 figure of merit that's a bit better, although there is still no substitute for running the real applications. 4) How about some suggestions for ways to improve Dhrystone? (This is not necessarily an attempt to make it "perfect" (no such thing) or even necessarily a good model for operations statistics, but at least to make some obvious improvements.) Here are a few suggestions for starters: a) Eliminate the problems that stop one from using real optimizers. As it stands, it is very hard to compare any numbers, since you don't know what sorts of optimizations are done. This is particularly handicapping to those of us who have real optimizers, but can't use them for the test. In some cases, if you eliminate the global optimizer, you also eliminate optimizations which are included in other people's compilers [that don't have separate optimizers], so they get the benefit of the techniques. Note that real optimizers are much more prevalent than they were when Dhrystone was proposed. As Rick says in his Dhrystone notes: "To summarize, DHRYSTONES by themselves are not anything more than a way to win free beers when arguing 'Box-A versus Box-B' religion. They do provide insight into Box-A/Compiler-A versus Box-A/Compiler-B comparisons." Right now, the best compilers are penalized [10-20%, from our numbers]. b) STRCPY: we've seen strcpy take up to 30% of the execution time. Does anybody have any real nubmers of real programs on the real usage patterns for the str* and mem* routines? [Call frequency, distribution of string sizes, etc] We suspect that there's more strcpy time than most real programs. c) Other areas [much more work]: 1) Make program bigger to avoid small-cache effects. 2) Rework overall program behavior to be more like substantial programs [we do a lot of statistics-gathering on references, instructions frequency, branch counts, function calls, etc, and Dhrystone as it sits isn't excpetionally representative.] Enough: again, no 1 figure-of-merit does it, but let's at least keep this one in proper perspective [which I think Rick does]. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086