Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!uwm.edu!linac!att!ucbvax!RICHTER.MIT.EDU!krowitz
From: krowitz@RICHTER.MIT.EDU (David Krowitz)
Newsgroups: comp.sys.apollo
Subject: Re:  Snakebytes (long -- and poisonous?)
Message-ID: <9103271603.AA04216@richter.mit.edu>
Date: 27 Mar 91 16:03:23 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 35

One additional note on the performance numbers ...

The benchmarks used for the Mflop numbers fit within the
256 KB data cache of the 720/730/750 for both single and
double precision versions. If your application does *not*
fit within the data cache, and if it is also a 64-bit
floating point arithmetic application, then your performance
will fall by a factor of 2. The official 100x100 Linpack
benchmark fits entirely within the data cache for both
single and double precision versions; and both versions
achieved 13.5 Mflops in my testing. However a 300x300 LU
decomposition benchmark (Jack Dongarra's LU benchmark
program testing the effects of loop unrolling and
parallel vector code) had a quite different result: the
single precision version ran a twice the speed of the
double precision version. Neither benchmark fit within
the cache with the 300x300 problem (360 KBb single precision,
720 Kb double).

It should be noted that the data caches on the Sparcstations
and most of the DEC machines are smaller than even the 100x100
Linpack benchmark (the double precision version), so that the
Mflop numbers for these machines are the not-in-cache, 64-bit
arithmetic results; while the HP700 numbers are for the
in-cache 64-bit arithmetic. 

Caveat Emptor! Know Your Benchmarks!


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)