Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!tardis!tymix!uunet!validgh!dgh From: dgh@validgh.com (David G. Hough on validgh) Newsgroups: comp.arch Subject: rounded vs. chopped floating-point arithmetic Message-ID: <402@validgh.com> Date: 20 Jun 91 13:50:36 GMT Organization: validgh, PO Box 20370, San Jose, CA 95160 Lines: 96 Nelson Beebe (beebe@math.utah.edu) recollected the following message to a colleague: ------------------------------------------- The following little program can be used to illustrate the effect of truncating arithmetic has on your larger program: real dt,t0,t1,t2,tend integer n n = 0 dt = 0.018 t0 = 4000.0 tend = 5000.0 t1 = t0 t2 = t0 10 n = n + 1 t1 = t1 + dt t2 = t0 + float(n)*dt if (t2 .lt. tend) go to 10 write (6,*) t1,t2,(t1 - t2)/t2 end On the IBM 3090, this single precision version prints: 4879.89844 5000.00781 -0.240218341E-01 That is, the relative error is 2.4%. On the Sun 4, it produces 5003.70 5000.01 7.37889E-04 The effect of truncating arithmetic on the running sum is large. The double precision version is: double precision dt,t0,t1,t2,tend integer n n = 0 dt = 0.018D+00 t0 = 4000.0D+00 tend = 5000.0D+00 t1 = t0 t2 = t0 10 n = n + 1 t1 = t1 + dt t2 = t0 + dfloat(n)*dt if (t2 .lt. tend) go to 10 write (6,*) t1,t2,(t1 - t2)/t2 end The IBM 3090 result is 5000.00799995563648 5000.00799999999981 -0.887265231637227285E-11 The Sun 4 result is 5000.0080000016 5000.0080000000 3.2341579848518D-13 ------------------------------------------- Note that satisfactory results are obtained if you use enough precision or if you round rather than chop. Also note that this is not the program that failed, but rather a drastic simplification of the user's actual application to reveal the essential problem. It's a simple example where the superior statistics of rounding rather than chopping imply a broader domain of applicability for a particular program. Correct rounding and chopping, and several other good paradigms, can be characterized by the property The rounded computed result is chosen from the two machine-representable numbers nearest the unrounded infinitely-precise exact result, according to a rule that depends only on the infinitely-precise exact result, and not on the operands or operation (or phase of moon). Most "fast" sort-of-rounding or sort-of-chopping schemes invented by hardware designers eventually frustrate error analysts because they can't be so characterized. As for the first IBM RS/6000 implementation, I have heard that the original floating-point unit was designed to implement IBM 370 arithmetic, and was changed to IEEE 754 format relatively late in the game. If true then it would not be surprising that some aspects of 754 were problematic to add in. The interesting question would then be which aspects of IEEE arithmetic will be really problematic for a high-performance RS/6000 implementation designed from scratch to support 754. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.ornl.gov