Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!RICHTER.MIT.EDU!krowitz
From: krowitz@RICHTER.MIT.EDU (David Krowitz)
Newsgroups: comp.sys.apollo
Subject: Re:  Snakebytes (long -- and poisonous?)
Message-ID: <9103272114.AA31661@richter.mit.edu>
Date: 27 Mar 91 21:14:36 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 39

Nope! You are correct. They are *fast* machines even if
the application does not fit in cache ... but not nearly
as fast as the published numbers imply. The 55 MIPS of
the 720 versus the 28 MIPS of the Sparc 2 is a real
performance edge. The 6.5 Mflops of the 720 on a 300x300
LU decomposition is a real performance edge over the
2.6 Mflops of the Sparc 2 on the same test ... it's a
factor of roughly 2.5 

However, the *published* numbers being spread about are
17 Mflops for the 720 vs 4.2 Mflops for the Sparc 2 ...
which is a factor of 4 performance edge which is only
achievable with compilers that are not shipping for
another several months and which is only achievable
for smaller data sets.

A 256 Kb data cache is sufficient for many tasks (not
any of ours, unfortunately -- geophysics applications
tend to consider 500x500 systems of equations as *small*,
1000x1000 as moderate, and 5000x5000 as what-you-really-
want-to-do-for -your-thesis ;-0 ) ). It is critical, however,
for people to understand the conditions of a benchmark
run. Because most of the benchmarks HP quotes run in-cache
on the 700 series, they tend to represent best-case
results. Because most of the benchmarks do *not* run
in-cache on the DEC and Sun machines, the results tend
to be closer to the achievable performance levels for a
wider range of problems -- both large *and* small applications
run at a mere 4.2 Mflops on the Sparc 2, but only small
applications run at 13.5 Mflops on the 720 ... the large
ones run at 6.5 Mflops (unless its single precision, 32-bit,
in which case it still runs at 13.5).

The key to this all is to *know* YOUR application and to
*know* the benchmark's characteristics and to *know* what
compilers and/or tuning was used. It makes up to a factor
of 2 difference in the results.

== Dave