Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!uwm.edu!linac!att!ucbvax!RICHTER.MIT.EDU!krowitz From: krowitz@RICHTER.MIT.EDU (David Krowitz) Newsgroups: comp.sys.apollo Subject: Re: Snakebytes (long -- and poisonous?) Message-ID: <9103271603.AA04216@richter.mit.edu> Date: 27 Mar 91 16:03:23 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 35 One additional note on the performance numbers ... The benchmarks used for the Mflop numbers fit within the 256 KB data cache of the 720/730/750 for both single and double precision versions. If your application does *not* fit within the data cache, and if it is also a 64-bit floating point arithmetic application, then your performance will fall by a factor of 2. The official 100x100 Linpack benchmark fits entirely within the data cache for both single and double precision versions; and both versions achieved 13.5 Mflops in my testing. However a 300x300 LU decomposition benchmark (Jack Dongarra's LU benchmark program testing the effects of loop unrolling and parallel vector code) had a quite different result: the single precision version ran a twice the speed of the double precision version. Neither benchmark fit within the cache with the 300x300 problem (360 KBb single precision, 720 Kb double). It should be noted that the data caches on the Sparcstations and most of the DEC machines are smaller than even the 100x100 Linpack benchmark (the double precision version), so that the Mflop numbers for these machines are the not-in-cache, 64-bit arithmetic results; while the HP700 numbers are for the in-cache 64-bit arithmetic. Caveat Emptor! Know Your Benchmarks! -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter.mit.edu@eddie.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)