Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!ucsd!nosc!marlin!aburto From: aburto@marlin.NOSC.MIL (Alfred A. Aburto) Newsgroups: comp.benchmarks Subject: Re: bc benchmark [really: One Number] Message-ID: <1685@marlin.NOSC.MIL> Date: 2 Jan 91 21:45:19 GMT References: <44342@mips.mips.COM> <15379@ogicse.ogi.edu> <44353@mips.mips.COM> Reply-To: aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) Organization: Naval Ocean Systems Center, San Diego Lines: 59 Distribution:comp.benchmarks In article <44353@mips.mips.COM> mash@mips.COM (John Mashey) writes: >(Note, for example, that published Dhrystone results easily mis-predict >SPEC integer benchmarks pretty badly, i.e., it is quite easy for machine >"a" to be 25% faster on Dhrystone than "b", and end up 25% SLOWER on more >realistic integer benchmarks.) >-- >-john mashey DISCLAIMER: This is an interesting observation (result). Dhrystone was intended to be REPRESENTATIVE of TYPICAL integer programs. That is, hundreds (I believe) of programs were analyzed to come up with the (ahem) 'typical' high level language instructions and their frequency of usage. In view of this I would, at first sight, suspect the Dhrystone to be more accurate than SPEC as SPEC is based upon only a few integer programs. What happened? Why does Dhrystone fail? Is it due to: (a) Instruction Mix is WRONG? (b) Optimization Problems? This is not a problem in my view --- we just need people to report results using various compiler options then we gain a more proper perspective of the variation in performance. Of course, in general, people tend to publically report the 'Max' or 'Best' performance. The 'Min' or 'Mean' results are more difficult to find. I know Dhrystone (1.0, 1.1, 2.0, 2.1) can all be optimized a great deal (up to a factor of 2 or so because I've done it) but this should not be a problem as long as we know what result corresponds to what compiler options --- this helps to define the RANGE of expected performance (Min, Max and/or Std. Dev.) with a certain compiler and system, and also the 'Mean' or 'Median' performance. (c) Program Size TOO small? I suppose that if it were not for cacheing (cache size) effects then program size should not be a problem, but I'm no expert ... (d) Something else? Why should one expect the integer SPEC results to be more 'accurate' than the Dhrystone? I'm just wondering. What is a 'typical' program or 'typical' frequency of instruction usage? Seems to me there is no one real 'typical' anything but a wide variety of 'typical' programs, instruction mixes, and frequency of usages depending upon application. Real programs also show a great variation in performance. I noticed this recently in a Scientific American article (Jan 1991) which showed the comparison of 13 different real programs on a wide variety of supercomputers. The program 'megflop' variation in perfromance was truly tremendous especially for the fastest systems (Cray and a NEC computer I think). Al Aburto aburto@marlin.nosc.mil