Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!uunet!littlei!omepd!radix!jimv
From: jimv@radix (Jim Valerio)
Newsgroups: comp.arch
Subject: Re: Query about miserable M68882 performance [really 80-bit notes]
Message-ID: <81@radix>
Date: 8 Apr 89 00:56:29 GMT
References: <2583@tank.uchicago.edu> <7829@pyr.gatech.EDU> <16543@winchester.mips.COM>
Reply-To: jimv@radix.UUCP (Jim Valerio)
Organization: Radix MicroSystems, Beaverton
Lines: 116

In article <16543@winchester.mips.COM> John Mashey asks about systems
that keep intermediate floating-point results in an extended format.
I feel compelled to respond, seeing that I'm an advocate of systems
supporting extended precision.  I'll first give an example how extended
precision helps, and then try to answer John's question of how well keeping
intermediate results in extended precision works.

Think of extended precision as a data format that provides slightly better
error control without the hardware or execution time costs of going to the
next larger format with twice the precision.

Extended precision gives you two advantages over the extended number's base
format: a wider exponent range and a wider significand.  When computing the
intermediate results of an expression:
  (1) the wider exponent makes overflow and underflow unlikely, and
  (2) the wider significand masks rounding errors.

So what?

Here's an example.  Let's compute cabs(x+yi), or hypot(x,y).

With an extended exponent range available and intermediate results all stored
in extended precision, hypot(x,y) can be computed without fear of overflow or
underflow with the simple formula:

		sqrt(x*x + y*y)

Here's what the 4.3bsd libm suggests as an algorithm to get around the
overflow and underflow problem when extended precision is not available.

		x = |x|
		y = |y|
		swap x,y if y>x
		if (x==0) return(0);
		if (y==0) return(x);
		exp = logb(x);
		x = scalb(x,-exp);
		y = scalb(y,-exp);
		return scalb(sqrt(x*x+y*y),exp);

However, the last line suffers from an undesirably large accumulated roundoff
error.  So here's how the 4.3bsd libm actually implements hypot(x,y):

	1. replace x by |x| and y by |y|, and swap x and
	   y if y > x (hence x is never smaller than y).
	2. Hypot(x,y) is computed by:

   Case I, x/y > 2
		
			       y
	hypot = x + -----------------------------
		 		    2
		    sqrt ( 1 + [x/y]  )  +  x/y

   Case II, x/y <= 2 
			                   y
	hypot = x + --------------------------------------------------
			          		     2 
			     			[x/y]   -  2
		   (sqrt(2)+1) + (x-y)/y + -----------------------------
		 		    			  2
		    			  sqrt ( 1 + [x/y]  )  + sqrt(2)


Pretty frightful cost just for a little accuracy, isn't it?

My experience is that this problem crops up in most math library functions.


Now back to John's question:
>2) How well [does keeping intermediate results in extended precision] work?

The LINPACK benchmark reports on the error of the arithmetic operations.
Here are the results when run on the 80960KB, a machine that implements
operations on extended precision operands.

				norm. resid	resid		machep
Double precision (extended)	1.50		6.66e-14	2.22e-16
Double precision (double)	1.67		7.46e-14	2.22e-16

The only difference between the extended and the double entries is that
in the extended entry, the intermediate result of A*X(I) is in extended
precision while the double case keeps the result in double precision.

Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in
<350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark
that might confirm the same observation using results from the Sun 3/280.
I say might because I am just presuming the results were run with an 68881,
and may or may not have been compiled using an extended-precision evaluation
model.

>machine		precision	    RMS error in solution (*)
>---------------------------------------------------------------------
>Cyber 205 / ETA-10	32-bit			2.22521256e-01
>IBM 3081		32-bit			2.37465184e-03
>IEEE standard		32-bit			2.82104476e-04
>---------------------------------------------------------------------
>Cyber 205 / ETA-10	64-bit			1.32111221e-08
>Cray X/MP		64-bit			2.47078473e-11
>IEEE standard		64-bit			2.27274978e-13
>---------------------------------------------------------------------
>NOTES:
>   (*) The solution vector consists of 1000 identical elements = 1.0
>   (1) The IEEE standard was run on a Sun 3/280, which had passed
>	the PARANOIA benchmark test.


In general, I don't know of any comprehensive data on how well extended
precision hides anomolies in the expression evalation found in practice
in typical floating-point applications.  Just as a well implemented math
library doesn't attract attention, attention either because it is too
slow or because it is too inaccurate, I believe that the benefit of
extended precision in expression evaluation is bound to be noticed
only by the lack of observed problems.
-- 
Jim Valerio	jimv%radix@omepd.intel.com, {verdix,tektronix,omepd}!radix!jimv