Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!RICHTER.MIT.EDU!krowitz From: krowitz@RICHTER.MIT.EDU (David Krowitz) Newsgroups: comp.sys.apollo Subject: Re: Snakebytes (long -- and poisonous?) Message-ID: <9103272114.AA31661@richter.mit.edu> Date: 27 Mar 91 21:14:36 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 39 Nope! You are correct. They are *fast* machines even if the application does not fit in cache ... but not nearly as fast as the published numbers imply. The 55 MIPS of the 720 versus the 28 MIPS of the Sparc 2 is a real performance edge. The 6.5 Mflops of the 720 on a 300x300 LU decomposition is a real performance edge over the 2.6 Mflops of the Sparc 2 on the same test ... it's a factor of roughly 2.5 However, the *published* numbers being spread about are 17 Mflops for the 720 vs 4.2 Mflops for the Sparc 2 ... which is a factor of 4 performance edge which is only achievable with compilers that are not shipping for another several months and which is only achievable for smaller data sets. A 256 Kb data cache is sufficient for many tasks (not any of ours, unfortunately -- geophysics applications tend to consider 500x500 systems of equations as *small*, 1000x1000 as moderate, and 5000x5000 as what-you-really- want-to-do-for -your-thesis ;-0 ) ). It is critical, however, for people to understand the conditions of a benchmark run. Because most of the benchmarks HP quotes run in-cache on the 700 series, they tend to represent best-case results. Because most of the benchmarks do *not* run in-cache on the DEC and Sun machines, the results tend to be closer to the achievable performance levels for a wider range of problems -- both large *and* small applications run at a mere 4.2 Mflops on the Sparc 2, but only small applications run at 13.5 Mflops on the 720 ... the large ones run at 6.5 Mflops (unless its single precision, 32-bit, in which case it still runs at 13.5). The key to this all is to *know* YOUR application and to *know* the benchmark's characteristics and to *know* what compilers and/or tuning was used. It makes up to a factor of 2 difference in the results. == Dave