Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!apple!vsi1!wyse!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: SPARC vs MC68040 Message-ID: <35846@mips.mips.COM> Date: 12 Feb 90 18:43:05 GMT References: <8859@portia.Stanford.EDU> <5190@convex.convex.com> <1850@cbnewsi.ATT.COM> <2938@oakhill.UUCP> <3085@rtmvax.UUCP> <35825@mips.mips.COM> <2943@oakhill.UUCP> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 135 In article <2943@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes: >In article <35825@mips.mips.COM> mash@mips.COM (John Mashey) writes: > >>>> is 23,148 KDhrys. The MC68040 runs that benchmark at twice the speed. > >>A bunch of people at various companies are busily stuffing SPEC numbers into >>spreadsheets, plus published mips-ratings, and analyzing. I'm also trying >>to calibrate i486 and 68040 numbers into this scheme. > >What does this have to do with Dhrystone? Sorry, among other things, when you start looking at such data, you see that: a) Dhrystone correlates with integer performance on real benchmarks within machine lines, with same compilers, at least somewhat. b) It has some correlation among machines lines. c) If one machine uses the inline, and one doesn't, the difference in performance badly mispredicts the performance on realistic programs. >The Dhrystone benchmarks have known weaknesses. The SPEC benchmarks have their >own. Many people are interested in Dhrystone so it gets talked about. If you >don't care for discussions on Dhrystone then simply ignore them. Impossible: it casues too much confusion, and I have to keep explaining to financial analysts, and I'm tired of that. The SPEC benchmarks have their own weakenesses of course, but they're hardly in Dhrystone's class. >>"In any case, for serious performance evaluation, users are advised to >>ask for code listings and to check them carefully." >This true for ALL benchmarks. Do you think it only applies to Dhrystone? >Do you think it does not apply to the SPEC benchmarks? I know I wouldn't >be choosing a computer architecture without looking at code the compiler >produces. Of course, but in Dhrystone's case, if all you ahve is the nubmers for two machines, you know very little about their relative performance, without looking at the code; it is especially irksome that it contains an optimization that improves it's performance greatly, that simply does not improve realistic programs significantly. (That doesn't mean that selective inlining of strings is bad; in fact, if Dhrystone contined a REPRESENTATIVE set of string operations, I wouldn't object so much, but it doesn't.) > >> EVERYBODY knows that inlining strcpy&strcmp can boost the number >> strongly without giving anything like that boost on real programs. >> SO POST THE CODE WHERE THE CRUCIAL STRCPY/STRCMP calls are made; >> otherwise, the number is simply meaningless, because anybody can >> boost the performance substantially on Dhrystone by an optimization >> that has relatively little effect on real programs. > >I fail to understand your tone here. By your own admission in a posting >you did to this newsgroup on March 15, 1989: > > "Now, according to the letter or the law of Herr Doktor Weicker's > Dhrystone 2.1 writeup, it's OK to in-line strcpy and strcmp. Yes, but subject to the comment above,which most people will not do, i.e., hardly anyone shows the code for this. The SPEC benchmarks were chosen to allow any optimization you like, but have the effect that there are very few optimizations you can do that won't help lots of real programs. > >and this is what the MC68040 compiler does. So just what is the problem? >Here is one of the string copies (they all look similar) directly from the >benchmark's .s file: > > lea.l (12,%sp),%a5 > mov.l %a5,%a1 > mov.l &L%93,%a0 > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.l (%a0)+,(%a1)+ > mov.w (%a0)+,(%a1)+ > mov.b (%a0)+,(%a1)+ Good! you at least did it correctly in general case, unlike the i860 that pads this to 32-bytes so it can do 2 quad-word loads & stores... > >Now let's see you post the code that your MIPS compiler produces. Then tell >us what you find to be relevant about the two postings. Dhrystone usually overpredicts VAX-relative performance; on most machines, if I know that this inlining is being done, I can estimate that it overpredicts it another 20-30%. That's what's relevant. The numbers we use all come from: jal strcpy and I've seen the SPARC code as the equivalent; meaning, I think such things don't overpredict as much (they still overpredict, and this has been well-documented for years in published materials.) And the reason (we don't inline str*) is: a) When you inline code it gets bigger. b) You might want to inline it only in those places it's called a lot. c) But there's acomplicated set of rules for when it's really a good idea in general. Among other things, MOST strcpy's aren't of constants, they're of pointers to things whose alignment can't be predicted, or at least the target is some arbitrary pointer, and then this optimization doesn't work very well. The only one I've seen that looked like it would really pay off is inlining strcpy's of small constants (1-2 bytes), or ones where you happen to know the alignment, and then up to a few words. d) Remember, we actually do full-bore inlining in the general case.... but are forbidden by the rules from using it.... and we don't. Here's the bottom line: either Dhrystone is a good predictor of integer performance on real programs, or it isn't. If it is (and it once almost used to be), then it's a Good Thing, because it's simple and easy to use. If it doesn't correlate well with performance on real programs, then it's become obsolete. Rather than replowing ground that has been plowed for years, let's try something else, as a bottom line, and get something concrete: QUIZ: It is claimed that a 25MHz 68040 is 2X faster than a 25MHz SPARC on Dhrystone; for concreteness, consider a 68040 with at least 64K external cache, a) Will it be 2X faster on the Geometric Mean of the 4 SPEC C benchmarks? (Using same compiler as Dhrystone.) b) Will it be more than 2X? c) Will it be less than 2X? d) Will it be a lot less than 2X, in fact, maybe closer to 1X? I'd encourage anyone who posts to post an Analysis to back up their opinion, with some data; I'm working on a Guesstimate for about a week from now. If someone prefers other realistic benchmarks, that would be a good exercise as well. In any case, thanx to Mr. Trissell for properly qualifying the Dhrystone number; this actually helps a lot. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086