Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!ogicse!ucsd!nosc!marlin!aburto From: aburto@marlin.NOSC.MIL (Alfred A. Aburto) Newsgroups: comp.benchmarks Subject: Re: Which benchmarks are useless? Keywords: benchmarks date statistical correlation Message-ID: <1756@marlin.NOSC.MIL> Date: 2 May 91 15:28:10 GMT References: <2717@spim.mips.COM> <1751@marlin.NOSC.MIL> <2800@spim.mips.COM> Distribution: comp.benchmarks Organization: Naval Ocean Systems Center, San Diego Lines: 120 In article <2800@spim.mips.COM> mash@mips.com (John Mashey) writes: >In article <1751@marlin.NOSC.MIL> aburto@marlin.NOSC.MIL (Alfred A. Aburto) writes: >>In article <2717@spim.mips.COM> mash@mips.com (John Mashey) writes: >For the high ones, look at IBM RS/6000, i860, or maybe Motorola 88K. >AMD would probably be high, but doesn't have SPEC result published, >to my knowledge. >Note that IBM's 27.5/15.8 = 1.7+ ... and I think you'll find the i860 is >probably up there as well ... and the DG workstation labeled 17 mips >gets around 10 on SPECint ... > Sorry about the delayed response but I was out of town too. Unfortuately I don't have all that information. I did see i860 Dhrystone 2.1 and SPECmark numbers posted in this news group but they were not broken down like the other information I had so I didn't use that i860 information (I needed Dhrystone 1.1, and the SPECratios for each of the Integer SPEC programs). Anyone have this information for the i860 and RS/6000 systems? >I don't think we disagree, except in choice of data. Of the data points, >the VAX is = 1 by definition. >4 of them are MIPS machines >1 is an HP [which is fairly similar to MIPS, and shares some roots in > similar compiler technology] >3 are SPARCs > >I.e., as I said, within product lines you expect that the major determiner >of speed is clock rate, and Dhrystone will show you that. There were 3 HP systems and one was a 68030 type. I thought I went across product lines reasonably well, but it was not complete of course (things never are, but thats good I guess). I did the correlation with respect to clock rate across the 10 systems: Dhrystone V1.1 SPECratio SPECint Ratio ---------------------- GCC ESP LI EQN Correlation WRT Clock Speed: 0.54 0.69 0.54 0.45 0.60 0.58 The correlation with clock speed appears marginal which is interesting. There appear to be are other things going on besides just clock speed increases. >As a minor point, for whatever reason, most of the dhrystone-vax-mips >ratings in the world assume VAX-11/780 = 1,757 1.1 Dhrystones, >which slightly raises the numbers everywhere. Yes, I was aware of that, but I felt constrained to use the peak numbers as given in that article and the article indicated 1870 Dhrys/sec (1.1) peak for the the VAX 11/780. I've seen the 1757 Dhrys/sec (V1.1) referenced in IBM advertisements for their POWERstations, but that is all I know about that number. >The major issue is (just to make sure people aren't confused by the posted >table): >IF you pick two machines at random, A and B: > > a) Dhry(A) and Dhry(B) will both give vax-mips ratings that are high. > b) Dhry(A)/Dhry(B) will give reasonably good correlations with > SPEC(A)/SPEC(B), especially if A and B are from same family or > are related. Based on the results I'd say Dhry(A) and Dhry(B) yield VAX-MIPS ratings that are 14% to 24% high WHEN COMPARED to SPECint(A) and SPECint(B) VAX-MIPS ratings. I'd hesitate to infer anything beyond that as I'm still seeking more information. The results indicate that Dhry(A) / Dhry(B) ratios correlate strongly (I would say 'strongly' vice 'resonably good') with SPEC(A) / SPEC(B) ratios. The correlations were rather high after all with a minimum of 0.90 and max of 0.98. The average was 0.96 and the correlation with SPECint was 0.99. These high correlations may not hold up though if we had a larger data base to examine. I think we still need to sift through more data. The correlations were across several different CPU's so I don't agree that 'especially ...' part of b) above as part of the results. >Unfortunately, there are also plenty of data points, specifically, >with machines that included instructions to help strcpy, or have done >certain optimizations, where you easily pick points where: > Dhry(A) > Dhry(B) and SPEC(A) < SPEC(B), by a substantial margin. Yes, this is very true but please note that there are also cases where Dhry(A) < Dhry(B) AND SPEC(A) > SPEC(B). There is an example of this in the table of results I posted. It is not a substantial difference but still a difference. By the way, I'm not 'down on SPEC', or 'up on Dhrystone'. I think SPEC is the best thing that has happened to benchmarking recently. SPEC is certainly developing an excellent data base on system performance. A verifiable, repeatable, and solid data base. Something we've needed for a long time. Dhrystone results however are confusing, mostly because it is a small program, cache sensitive, and it can be optimized to such a large extent. So, yes, I agree one can get surprised and confused with the Dhrystone results. One needs to be very careful in using those numbers. I picked the peak numbers only because they appeared to be more consistent. If I had used the low numbers or an average of the low and high then I don't think the results would have been near the same. I was interested in the correlation of OTHER integer programs too with SPECint. I used Dhrystone because results were readily available. >Anyway, maybe someone can put together a table with > 1 MIPS > 1 SPARC > 1 RS/6000 > 1 i860 > 1 HP > 1 88K > 1 68K > 1 486 >and using the more common 1757 number... > Yes, the data must be available here and there and it would be good to get it altogether in one place (here I hope) ..... Al Aburto aburto@marlin.nosc.mil