Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!uunet!littlei!omepd!radix!jimv From: jimv@radix (Jim Valerio) Newsgroups: comp.arch Subject: Re: Query about miserable M68882 performance [really 80-bit notes] Message-ID: <81@radix> Date: 8 Apr 89 00:56:29 GMT References: <2583@tank.uchicago.edu> <7829@pyr.gatech.EDU> <16543@winchester.mips.COM> Reply-To: jimv@radix.UUCP (Jim Valerio) Organization: Radix MicroSystems, Beaverton Lines: 116 In article <16543@winchester.mips.COM> John Mashey asks about systems that keep intermediate floating-point results in an extended format. I feel compelled to respond, seeing that I'm an advocate of systems supporting extended precision. I'll first give an example how extended precision helps, and then try to answer John's question of how well keeping intermediate results in extended precision works. Think of extended precision as a data format that provides slightly better error control without the hardware or execution time costs of going to the next larger format with twice the precision. Extended precision gives you two advantages over the extended number's base format: a wider exponent range and a wider significand. When computing the intermediate results of an expression: (1) the wider exponent makes overflow and underflow unlikely, and (2) the wider significand masks rounding errors. So what? Here's an example. Let's compute cabs(x+yi), or hypot(x,y). With an extended exponent range available and intermediate results all stored in extended precision, hypot(x,y) can be computed without fear of overflow or underflow with the simple formula: sqrt(x*x + y*y) Here's what the 4.3bsd libm suggests as an algorithm to get around the overflow and underflow problem when extended precision is not available. x = |x| y = |y| swap x,y if y>x if (x==0) return(0); if (y==0) return(x); exp = logb(x); x = scalb(x,-exp); y = scalb(y,-exp); return scalb(sqrt(x*x+y*y),exp); However, the last line suffers from an undesirably large accumulated roundoff error. So here's how the 4.3bsd libm actually implements hypot(x,y): 1. replace x by |x| and y by |y|, and swap x and y if y > x (hence x is never smaller than y). 2. Hypot(x,y) is computed by: Case I, x/y > 2 y hypot = x + ----------------------------- 2 sqrt ( 1 + [x/y] ) + x/y Case II, x/y <= 2 y hypot = x + -------------------------------------------------- 2 [x/y] - 2 (sqrt(2)+1) + (x-y)/y + ----------------------------- 2 sqrt ( 1 + [x/y] ) + sqrt(2) Pretty frightful cost just for a little accuracy, isn't it? My experience is that this problem crops up in most math library functions. Now back to John's question: >2) How well [does keeping intermediate results in extended precision] work? The LINPACK benchmark reports on the error of the arithmetic operations. Here are the results when run on the 80960KB, a machine that implements operations on extended precision operands. norm. resid resid machep Double precision (extended) 1.50 6.66e-14 2.22e-16 Double precision (double) 1.67 7.46e-14 2.22e-16 The only difference between the extended and the double entries is that in the extended entry, the intermediate result of A*X(I) is in extended precision while the double case keeps the result in double precision. Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in <350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark that might confirm the same observation using results from the Sun 3/280. I say might because I am just presuming the results were run with an 68881, and may or may not have been compiled using an extended-precision evaluation model. >machine precision RMS error in solution (*) >--------------------------------------------------------------------- >Cyber 205 / ETA-10 32-bit 2.22521256e-01 >IBM 3081 32-bit 2.37465184e-03 >IEEE standard 32-bit 2.82104476e-04 >--------------------------------------------------------------------- >Cyber 205 / ETA-10 64-bit 1.32111221e-08 >Cray X/MP 64-bit 2.47078473e-11 >IEEE standard 64-bit 2.27274978e-13 >--------------------------------------------------------------------- >NOTES: > (*) The solution vector consists of 1000 identical elements = 1.0 > (1) The IEEE standard was run on a Sun 3/280, which had passed > the PARANOIA benchmark test. In general, I don't know of any comprehensive data on how well extended precision hides anomolies in the expression evalation found in practice in typical floating-point applications. Just as a well implemented math library doesn't attract attention, attention either because it is too slow or because it is too inaccurate, I believe that the benefit of extended precision in expression evaluation is bound to be noticed only by the lack of observed problems. -- Jim Valerio jimv%radix@omepd.intel.com, {verdix,tektronix,omepd}!radix!jimv