Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!apple!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Integer/Multiply/Divide on Sparc Message-ID: <34110@mips.mips.COM> Date: 4 Jan 90 06:00:37 GMT References: <158@csinc.UUCP> <787@stat.fsu.edu> <42701@lll-winken.LLNL.GOV> <788@stat.fsu.edu> <42737@lll-winken.LLNL.GOV> <5842@ncar.ucar.edu> <34058@mips.mips.COM> <28594@amdcad.AMD.COM> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 97 In article <28594@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >In article <34058@mips.mips.COM> mash@mips.COM (John Mashey) writes: >| Could somebody post the critical parts of this again so we can >| look at it? Although I have high respect for Plum-Hall in general, >| I'm always nervous about micro-level benchmarks. Now, I hate to have >| to defend SPARC :-), but I must: realistic integer benchmarks >| that I know [like the SPEC ones] simply don't correlate with >| the results claimed below, at least not very much. >| The RISC machines are noticably faster on actual integer programs.... >The benchmarks over-emphasize integer modulus. For example, the >benchmark that reportedly tests register-integer variables looks like: ...... >and spends roughly 75% of its time performing the "%" operation. Like I said, I'm always nervous about micro-level benchmarks, even when done by smart people. Here is the summary, followed by details: SUMMARY OF MY ADVICE: 1) do NOT EVER use this benchmark to believe it means anything; if you have a copy, throw it away. 2) FORGET any conclusions that anyone has posted about relative performance of machines, based on this benchmark, other than the possible thought that multiply/divide integer don't happen to be done in hardware on SPARC and HP PA. 3) If you've ever told anyone this means much, please tell them you're sorry. DETAILS: Bo Thide kindly sent me a copy, and I took a quick look, finding similar results to Tim's: optimized R3000 code spent 60% of the time doing % (and remember, we have one of those in hardware...) Tables were given that looked like this: register auto auto int function auto int short long multiply call+ret double ------------------------------------------------------------------------------- cc: Sun-4 (-O4) 0.41 0.44 0.40 4.45 0.67 1.00 ...... WHAT'S RIGHT: 1) The code is very carefully done to eliminate surprise optimization. WHAT'S WRONG: 1,2,3) Columns 1, 2, and 3, which purport to measure the performance of various integer code, are completely dominated by the modulus operation, which is simply contrary to the statistics of the overpowering bulk of code out there. It would be plausible to generate something that had a mix of +, -, *, /, % and logic ops, using carefully chosen frequencies from a number of real programs (and even there, there are pitfalls), but something that does no * or /, and % way out of proportion, is guaranteed to blast a SPARC about as badly as it can be, relative to almost anything else. It won't help HP PAs much either.... Also, for column 1, optimized, I got 0% loads, and 5% stores, rather than the more typical 20% and 10%. 4) This column indeed measures the speed of integer multiply, in such a way that no compiler can do anything but do real multiplies with it. 5) This column measures the speed of function call/return with zero arguments. Unfortunately, different programs have different distributions of numbers and types of arguments, and many functions have arguments. Different machines differ greatly in the cost of passing arguments, and i nteh costs of passing different numbers of arguments.... 6) I haven't looked at the statistics of this much, except to notice there are equal numbers of FP * and /, which is also atypical. ---- 7) In general, although it's been said before in this newsgroup: a) People design computers using the statistics of real programs. b) The statistics of real programs differ, hence the tradeoffs you make depend on the benchmarks chosen. c) There are certain classes of codes for which at least one of integer *, /, or % are important enough, and cannot be gotten rid of even by a perfect compiler, where having these in hardware will help a lot.Over many realistic programs, hardware helps about enough that some people chose to include it, and some didn't. There's no way in the world that having it makes a 2-3X performance difference, overall, although you can find some real programs where it does. d) Like many synthetic benchmarks, it simply doesn't have a mixture of expressions that relates well to real compilers do, i.e., there is little that an optimizing compiler can do with this code, and a small number of registers are completely adequate. Neither of these two is generically true for real code. Anyway, these benchmarks mostly measure integer multiply and divide; these operations are where most RISCs have the least advantage over most CISCs; these operations are definitely what anybody would use to show that some CISC is faster than SPARC or PA; but it just doesn't correlate very well with the speeds on real programs. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086