Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!mash From: mash@mips.com (John Mashey) Newsgroups: comp.arch Subject: Re: SPARC implementation or architecture Message-ID: <2500@spim.mips.COM> Date: 20 Apr 91 01:13:53 GMT References: <1991Apr17.183822.7681@elroy.jpl.nasa.gov> <41377@cup.portal.com> <1991Apr18.142341.23097@rice.edu> Sender: news@mips.COM Organization: MIPS Computer Systems, Inc. Lines: 249 Nntp-Posting-Host: winchester.mips.com In article <1991Apr18.142341.23097@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: > An Analysis of MIPS and SPARC Instruction Set Utilization on > the SPEC Benchmarks > Cmelik, King, Ditzel, Kelly > ASPLOS-IV, 1991 >Basically, the MIPS executed more instructions, except on most of the integer >benchmarks. The authors note that a fairer comparison, taking into >account register window overhead, interlocks, and annulled instructions >still gives a 9% advantage to the SPARC. >The biggest contributer to the difference seemed to be that the >MIPS required two instructions to load or store a DP floating-point value. >Hence the wide disparity in the FP benchmarks. >Someone (Charlie Price?) from MIPS objected that the study had been carried >out with an old generation of the MIPS compilers, and that newer numbers >were significantly better for the MIPS. Ditzel admitted this was possible, >but noted that they had used what was available on the market when they did >the study. >The paper goes on to discuss the affect of libraries, load/store usage, >branches, nops, integer ops, and fp ops. Lots of good ideas for >both architectures. The appendix contains detailed numbers for each >of the benchmarks. > >BTW, the numbers were collected with pixie (mips) and spixie (sparc). >One the consistantly interesting parts of the conference is the methodology >used to perform experiments. Lots of good ideas here. 1) Definitely lots of good analysis here; it is heartening to see such detailed analyses done on more meaningful programs, and when everybody gets their ASPLOS proceedings, it is worth studying. A subset of the conclusions is fairly accurate, and in particular, the corrections done for plausible extensions/changes are reasonably OK, as well as most of the analysis about the effects of various features. *********************************************************************** *Of course, most of the overall conclusions turn out to be wrong, if you *use contemporary MIPS compilers released the same week as the Sun *compilers used in the study, as opposed to 1-year old MIPS compilers. * In general, using the same approach as Sun, I get something like * a 10% edge for MIPS (but read on to understand the numbers). *********************************************************************** 2) Sun did a perfectly reasonable thing, which is use the compilers they could buy off-the-shelf from us for the analysis. Unfortunately, what fails to appear in the paper are the release dates of the compilers, i.e., that: MIPS 2.10 release was 2Q90 Sun compilers were announced for release last week Why that might be relevant: In the last year, I believe that Sun has devoted a large amount of analysis and tuning of their compilers, using SPEC, among other things. This is evidenced by, of course the exhaustive analysis in this paper, as well as the improvements from 10.0 SPECmarks to 11.8 SPECmarks for 25MHz SPARCs (SS1+ in 2Q90 to SS IPC in 2Q91, with new compilers). A 25MHz MIPS Magnum went from 17.8 to 18.6 in same time, and it actually turns out that MIPS has done some (but not huge) analysis and tuning for these benchmarks, and some of that shows up in the current compilers (2.20), which happened to have been released within a week of when the new Sun compilers were announced ..... although SPEC numbers for both MIPS and Sun compilers in Beta form were published a while ago by both. (Note, just for the record, that there is NOTHING wrong with anybody using the SPEC benchmarks to tune their compilers; it's infinitely better for the buyers of computers out there if people do it with SPEC than with Dhrystone or Whetstone. I.e., if Sun spends a bunch of effort analyzing SPEC to death and tuning things up, more power to them, because the tunings are much more likely to encourage optimizations that will help real programs, and actually DO something for a customer.) 3) The analysis in the paper is well worth reading, and the micro-level discussions are very useful. The conclusions mostly follow from the data; it's just that a 1-year difference in compiler choice still makes a difference, and the general conclusions end up getting wiped out by MIPS' compilers' gain in that year.... 4) Let's start with the raw instruction counts (remember that this need to be corrected for nops, annuls, stalls, etc ,etc). Instruction counts in Millions: Most numbers from Sun paper, Tables A1 and A2 Total = raw instruction counts Total+ = SPARC counts with adjustments for window-handling, annulled instructions, load-use stalls (i.e., a little closer comparison) MIPS220 = equivalent of Total, but with 2.20 compilers rather than 2.10. Notation of form (n/m) means column n divided by column m. Most important thing to look at is differences between columns 1 and 4, and 7 and 8. FIRST TABLE: COL 1 2 3 4 5 6 7 8 Source Sun Sun Sun mash mash Sun Sun mash Bench MIPS SPARC M/S MIPS220 M/S SPARC M/S M/S Total Total (1/2) Total (4/2) Total+ (1/6) (4/6) --------------------------------------------------------------------- spice 21,569 22,878 0.94 20,114 0.88 26,516 0.81 0.76 doduc 1,613 1,303 1.24 1,392 1.07 1,335 1.21 1.04 nasa7 9,257 6,615 1.40 9,186 1.39 6,719 1.38 1.37 matrx300 2,776 1,694 1.64 2,339 1.38 1,695 1.64 1.38 fpppp 2,316 1,443 1.61 2,111 1.46 1,472 1.57 1.43 tomcat 1,813 1,626 1.11 1,738 1.07 1,640 1.11 1.06 --------------------------------------------------------------------- FP Geometric Mean 1.30 1.19 1.25 1.15 --------------------------------------------------------------------- gcc 1,111 1,155 0.96 1,149 0.99 1,317 0.84 0.87 espresso 2,829 2,931 0.97 2,723 0.93 3,397 0.83 0.80 li 6,023 4,661 1.29 5,938 1.27 6,131 0.98 0.97 eqntott 1,243 1,322 0.94 1,244 0.93 1,458 0.85 0.85 --------------------------------------------------------------------- Integer Geometric Mean 1.03 1.02 0.88 0.87 --------------------------------------------------------------------- Overall Geometric Mean 1.18 1.12 1.09 1.03 --------------------------------------------------------------------- Now, with this data, I draw several conclusions, some of which are rather different than those given in the paper. Again, please note that the above is not PERFORMANCE data, but instruction-count (Total) or Sun-adjusted-instruction-count (Total+) data, or my data on MIPS-1 with current compilers. I'm particularly looking at the rightmost column above> 1) MIPS uses more instructions on floating-point [mostly due to the lack of 64-bit FP load/stores, which is especially crucial to the linear-algebra and related benchmarks]. The FP code improved somewhat from 2.10 to 2.20, i.e., some of the effects seen in the paper were from compilers where the benchmarks had barely been looked at in any serious fashion... 2) MIPS usually uses less instructions on integer programs, and definitely uses less instruction-equivalents, (i.e., from Total+). The range is from .80 to .97, with a 95% confidence interval of [0.76 to 0.99]. 4) Now, lets look at the analysis at the end of the paper, which is quite interesting (and is, in fact, a good example of the kinds of analyses architects do or should do when figuring out how to design and/or evolve and architecture). What they did was: start with the MIPS Total/Total+ counts (always equal), and the SPARC Total+ counts, and then estimate the effects of adding instructions to each architecture, to make them architectural-neutral, and then compare, mostly to compare compilers, I guess. Columns 1-3 were from Sun, and the MIPS Total+~ was computed from the earlier Total by giving MIPS a 64-bit load/store (probably the dominant effect). SPARC got the int<->fp improvements. Column 4 is computed from the actual numbers for MIPS-2 machines {R6000, R4000}, which have the 64-bit load/stores hypothesized by Sun, plus load-interlocks and annulled-branches, and sqrt, and a few other things. It would have been interesting to have seen the Sun numbers, as modified not just by the int<->fp change, but also by the integer mul/div and anything else coming in the next-generation SPARCs; however, the paper makes the case that the mul/div issue is not a large one for this set of benchmarks (maybe on the order of a percent, at most, depending on the benchmarks, especially as the multiply left in the ineer loop of matrix300 on MIPS has disappeared). SECOND TABLE COL 1 2 3 4 5 6 Source Sun Sun Sun mash mash mash Bench MIPS SPARC M/S MIPS-2 M/S M/S of adjusted Total+~ Total+~ (1/2) Total (4/2) (col 4 of prev tab-adj)/2 --------------------------------------------------------------------- spice 20,211 25,095 0.81 18,429 0.73 0.75 doduc 1,358 1,301 1.04 1,056 0.81 0.87 nasa7 6,927 6,682 1.04 6,454 0.97 1.03 matrx300 2,126 1,695 1.25 2,339 0.99 1.00 fpppp 1,616 1,440 1.12 1,319 0.92 0.98 tomcat 1,377 1,607 0.86 1,283 0.80 0.81 --------------------------------------------------------------------- FP Geometric Mean 1.02 0.86 0.90 --------------------------------------------------------------------- gcc 1,111 1,262 0.88 1,122 0.89 espresso 2,829 3,397 0.83 2,648 0.81 li 6,016 5,626 1.07 5,504 0.98 eqntott 1,243 1,371 0.91 1,247 0.91 --------------------------------------------------------------------- Integer Geometric Mean 0.93 0.90 --------------------------------------------------------------------- Overall Geometric Mean 0.97 0.88 Now, more notes: 1) It might be interesting to recalibrate the Sun-computed overall Geo Mean above to allow for the MIPS 2.20 compilers. I haven't done this in detail, but note that going from 2.10 (on which columns 1 and 3 above are based) to 2.20 changed the Geo Means of Total+ in the earlier table from 1.10 to 1.03, with most of the effect being in the FP area. Thus, I'd guess that the overall number as adjusted by Sun would have come out around .94, rather than .97. 2) AS noted above, it would have been really fascinating to have seen the actual next-generation SPARC numbers as column Total+~ above (but I realize that would have a been a bit much to ask, since while MIPS-2 changes are guessable, having been public for over a year, the SPARC changes aren't yet public, I think (?)) 3) So, to summarize, in the paper's conclusions, it says: a) "MIPS typically executes 18% more user-level instructions than SPARC" This could be rewritten as "MIPS typically executes 12% more user-level instructions than SPARC", although in any case, one must be very careful about the word "typical" here, in one can also say that MIPS usually executes less instructions on integer code, and more on FP code. ("typical" is not a statistical term :-) b) "A fairer comparison which takes into account register-window overhead, load-use interlocks, and annulled instructions, still shows a 9% advantage for SPARC" turns into: A fairer comparison shows a 3% advantage for SPARC. c) "most significant differences are SPARC's DP load/store, and MIPS compare-and-branch" Probably so. d) "When archiectural factors were factored out, the differences due to combined compiler/library effects were so small (3%) that neither MIPS nor SPARC has any significant advantage." Wellll... include MIPS 2.20 compilers: -the 4 integer benchmarks range from 3% less to 20% less, with Geo mean = 13% less. -the 6 FP benchmarks are a little harder to figure, but if you take the 2.20 compiler numbers (COl 4 of first table), and subtract Sun's adjustments for 64-bit, and divide result by Sun's Total+~ column, you get the ratios shown in Column 6 above, whihc, not surprisingly, are in between what Sun computed, and what we actually get in MIPS-2. In any case this yields 10% less, using Sun's rules. Anyway, one must be VERY careful to avoid over-generalization from a small number of data points, and note, all of this was instruction COUNTS, not cycles, and cycles are always more important. Nevertheless, this kind of analysis in the Sun paper is very useful to compiler writers (to answer question: why are THEY beating us in that benchmark on instructions counts? are they don't something special? is it architecture, or our compilers missing an optimization?) However, the major conclusion: that the two are indistinguishable, is wrong, if you think 10% is distinguishable... -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650