Path: utzoo!attcan!uunet!cs.utexas.edu!romp!auschs!awdprime!troyhix.austin.ibm.com!troy From: troy@troyhix.austin.ibm.com Newsgroups: comp.arch Subject: Re: RS6000 Multiply/Accumulate instruction Message-ID: <1872@awdprime.UUCP> Date: 22 Mar 90 18:12:55 GMT Sender: news@awdprime.UUCP Reply-To: ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy Organization: IBM Corporation - Advanced Workstation Division Lines: 164 >> >>Here is the result of the "non-IEEE-ness" of the rounding of the >> >>--------------------------------------------------------- >> Errors in solution of the 1000x1000 system of equations >> from the LINPACK benchmark suite >>machine precision RMS error >>--------------------------------------------------------- >>ETA-10 32-bit 2.2e-01 >>IBM 3081 32-bit 2.4e-03 >>VAX 8700 32-bit 3.9e-04 >>IBM RS/6000 32-bit 2.8e-04 <-- new ibm >>IEEE (Sun-3) 32-bit 2.8e-04 <-- ieee >>--------------------------------------------------------- >>ETA-10 64-bit 1.3e-08 >>Cray X/MP 64-bit 2.5e-11 >>IBM 3090 64-bit 5.8e-12 >>IBM RS/6000 64-bit 1.2e-12 <-- new ibm >>IEEE (Sun-3) 64-bit 2.3e-13 <-- ieee >>VAX "D"-format 64-bit 7.2e-14 >>--------------------------------------------------------- >>ETA-10 128-bit 1.6e-22 >>Cray X/MP 128-bit 4.2e-26 >>--------------------------------------------------------- >> >>(1) The IBM has no extra error in the 32-bit case, but >> loses a factor of 5 (about 2 bits) in the 64-bit case. >> This is still a factor of almost 5 better than the >> results from the IBM mainframe hex-exponent perversity. >>(2) The Cray and ETA-10 have larger exponent fields for >> 64-bit numbers than the other formats. This accounts >> for several bits of the difference. The trouble with >> the ETA-10 format is discussed in my article in >> Supercomputer, issue 24. >> >>John D. McCalpin - mccalpin@vax1.acs.udel.edu >> mccalpin@delocn.udel.edu >> mccalpin@scri1.scri.fsu.edu >>-- >>John D. McCalpin - mccalpin@vax1.acs.udel.edu >> mccalpin@delocn.udel.edu >> mccalpin@scri1.scri.fsu.edu >> >> >> Needless to say this has caused quite a stir here. :-( I brought this to the attention of our resident Floating-point math guru, Dr. Peter Markstein of IBM Yorktown research, who was one of the original 801 (first RISC) team member. Lately, reports of RMS errors from LINPACK have been circulating in comp.arch. The reports show that the IBM RiscSystem/6000 gets a larger RMS error than the SUN3 using standard double precision IEEE arithmetic. How can this be? Computing residuals is a very difficult task. To do it well, it must be done in a greater precision than the precision used to solve the equations. The reason for this is that in computing residuals, there is alot of cancellation of significant bits. After all, if we were computing with infinite precision, the addition of the very last term of the inner product used to compute the residual would produce a zero (cancelling all the digits). We can say very little about the residuals that the machines report using their own maximum precision arithmetic. However, we observe that a machine with standard IEEE arithmetic can very easily understate the size of the residual. Attached below are two programs: the first solves a trivial set of equations and then computes the RMS of the residuals. Indeed, the IBM '6000 gets a larger RMS than does a standard IEEE machine using the same precision. But the IBM '6000 computes each term of the residual vector EXACTLY, so there is no doubt as to which computation is more trustworthy. The second program works explicitly with the components of the residuals of the first. It computes the reciprocals of all the integers between 1 and 1000. It then reports the reciprocal to be exact if i * (1/i) - 1.0 is exactly 0.0. Now we KNOW that that i * (1/i) computes to EXACTLY 1.0 using floating point arithmetic only when i is a power of two. Yet, on a standard IEEE machine, the second program produces far more than 10 lines of output! To get back to the accuracy of residuals, one method of solving equations in the days that it was done by desk calculator was to first solve AX = B getting an approximate solution X'. The residual was computed as R = B - AX', with the arithmetic done in double the usual precision and the residual R then rounded to standard precision. A correction term was then computed by solving AY = R, yielding an approximate solution Y'. Then X' + Y' was an improved solution, usually giving full precision accuracy. This method IS NOT used in computers because, using full precision floating point, the residual R has zero significant bits. Using it to refine the solution does no good at all. Only if R can be computed by machine the way it was on paper, with extra precision, is iterative refinement useful. On most computers, this is simply too slow. On the '6000, however, the residual CAN be computed with reasonable effort, and iterative refinement may make a comeback as a useful computational technique. Student School of Hard Knocks C Program 1 - Solving a trivial set of linear equations implicit real*8(a-h,o-z) dimension a(1000), x(1000), b(1000) C This program solves the set of 1000 equations AX = B C for the very simple case in which A is the diagonal C matrix with A(i,i) = i, and B(i) = 1. Of course, we C know that the result is X(i) = 1/i, as best that 1/i C can be represented. Since we all are IEEE machines, C we will all get the same results for X. C Then we compute the residuals and sum the squares. C We know that most of the X(i) are not exact, because C only powers of two have reciprocals that can be repre- C sented exactly in binary. So the sum of the squares C of the residuals should be non-zero. The IBM C RiscSystem/6000 computes all the residuals exactly. C However, a standard IEEE machine will report zero C for many of the components of the residual vector, C and thus reports a lower RMS! do 1 i = 1, 1000 a(i) = i b(i) = 1.0 1 x(i) = b(i) / a(i) C The above loop set up the equations and solved them C Now we compute the residuals and sum the squares. sum = 0.0 do 2 i = 1, 1000 2 sum = sum + (b(i) - a(i) * x(i))**2 C Now we get the RMS ... rms = Sqrt(sum / 1000.0) write(*,10)rms 10 format(' The RMS for solving nx = 1, n = 1...1000, is',e15.7) stop end C Program 2 -- Determining if reciprocals have an exact representation implicit real*8(a-h,o-z) do 100 i = 1 , 1000 C We will compute reciprocals of the first 1000 C integers, and report the reciprocal to be exact C if the product of the integer and its computer C reciprocal are exactly 1.0 x = i r = 1.0/x if ((x*r - 1.0) .eq. 0) write(*,10)i 100 continue stop 10 format(' The integer ', i4, ' has an exact reciprocal in binary') end Peter Markstein To Reply directly to Dr. Markstein his Email Address is: USA InterNet: uunet!cs.utexas.edu!ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn or ...@cs.utexas.edu:ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn ><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< IBM Corp. - Adv. Workstation Div Troy N. Hicks IBM Tieline: 678-7318 Mail Stop: ZIP 4359 USA Phone: (512)838-7318 Austin, Texas 78758 IBM VNet: TROYHIX at AUSVM9 IBM InterNet: troy@troyhix.austin.ibm.com USA InterNet: uunet!cs.utexas.edu!ibmaus!auschs!troyhix.austin.ibm.com!troy or ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy /* I DO NOT speak for IBM, only for MYSELF */ ><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><