Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: IEEE floating point
Message-ID: <MCCALPIN.91May27092950@pereland.cms.udel.edu>
Date: 27 May 91 13:29:50 GMT
References: <9105250030.AA08036@ucbvax.Berkeley.EDU>
	<1991May25.222551.16365@zoo.toronto.edu>
	<MCCALPIN.91May26100257@pereland.cms.udel.edu>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 72
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: mccalpin@perelandra.cms.udel.edu's message of 26 May 91 14:02:57 GMT

> On 26 May 91 14:02:57 GMT, mccalpin@perelandra.cms.udel.edu, I said:

>>> On 25 May 91 22:25:51 GMT, henry@zoo.toronto.edu (Henry Spencer) said:

Henry> I will confine myself to observing that IBM hex FP is the only
Henry> FP format I know of that made half the FP instructions -- the
Henry> single-precision ones -- just about useless to most
Henry> programmers.

Me> I have to point out that the Cyber 205/ETA-10 32-bit FP is worse than
Me> the IBM's hex FP --- at least for the cases I tested.  I discussed
Me> some of this in a paper in Supercomputer in 1987 or 1988 (issue 24, I
Me> believe).  The more interesting part was an analysis of the roundoff
Me> error in the 1000x1000 LINPACK benchmark, which unfortunately did not
Me> make it into the paper in its completed form.

Just so no one can complain too much about "anecdotal evidence", here
are the results from my study:

   Errors in solution of 1000x1000 system of equations
           from the LINPACK benchmark suite

machine			precision		RMS error
---------------------------------------------------------
ETA-10			 32-bit			2.2e-01
IBM 3081		 32-bit			2.4e-03
VAX 8700		 32-bit			3.9e-04
IEEE (Sun-3)		 32-bit			2.8e-04
IBM RS/6000		 32-bit			2.8e-04

ETA-10			 64-bit			1.3e-08
IBM RS/6000		 64-bit			1.3e-10
Cray X/MP		 64-bit			2.5e-11
IBM 3090		 64-bit			5.8e-12
IBM RS/6000 (-qnomaf)	 64-bit			2.2e-12
IEEE (Sun-3)		 64-bit			2.3e-13
VAX "D"-format		 64-bit			7.2e-14

ETA-10			128-bit			1.6e-22
Cray X/MP		128-bit			4.2e-26
---------------------------------------------------------

Notes:
(1) The LINPACK 1000x1000 matrix is set up so that the solution is a
vector whose elements are all equal to 1.0.  The RMS error is
calculated by
	err2 = 0.0d0
	do i=1,1000
	   err2 = err2 + (x(i)-one)**2
	end do
	rms = sqrt(err2/999.)

(2) The reason for the difference between the IEEE and IBM RS/6000
errors is unclear to me.  I got a long explanation from someone at IBM
Austin a year or two ago trying to explain why the enhanced accuracy
led to a larger error calculation -- but I never did really understand
it.   In an effort to check the accuracy of the IEEE error estimates,
I sorted the errors and then summed them in order of increasing
magnitude and got identical results....
Note that the -qnomaf flag prohibits the compiler from using the
enhanced accuracy combined multiply/add instruction.  This should give
IEEE compliant results, but it does not.

(3) The IEEE results are from a Sun-3, but have been duplicated on
Sun-4 and SGI (MIPS) machines.

(4) I will post the answers for 128-bit arithmetic on the IBM RS/6000
as soon as I get the new compiler installed....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET