Path: utzoo!attcan!uunet!cs.utexas.edu!romp!auschs!awdprime!troyhix.austin.ibm.com!troy
From: troy@troyhix.austin.ibm.com
Newsgroups: comp.arch
Subject: Re: RS6000 Multiply/Accumulate instruction
Message-ID: <1872@awdprime.UUCP>
Date: 22 Mar 90 18:12:55 GMT
Sender: news@awdprime.UUCP
Reply-To: ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy
Organization: IBM Corporation - Advanced Workstation Division
Lines: 164

>>
>>Here is the result of the "non-IEEE-ness" of the rounding of the
>>
>>---------------------------------------------------------
>>   Errors in solution of the 1000x1000 system of equations
>>           from the LINPACK benchmark suite
>>machine                 precision               RMS error
>>---------------------------------------------------------
>>ETA-10                   32-bit                 2.2e-01
>>IBM 3081                 32-bit                 2.4e-03
>>VAX 8700                 32-bit                 3.9e-04
>>IBM RS/6000              32-bit                 2.8e-04  <-- new ibm
>>IEEE (Sun-3)             32-bit                 2.8e-04  <-- ieee
>>---------------------------------------------------------
>>ETA-10                   64-bit                 1.3e-08
>>Cray X/MP                64-bit                 2.5e-11
>>IBM 3090                 64-bit                 5.8e-12
>>IBM RS/6000              64-bit                 1.2e-12  <-- new ibm
>>IEEE (Sun-3)             64-bit                 2.3e-13  <-- ieee
>>VAX "D"-format           64-bit                 7.2e-14
>>---------------------------------------------------------
>>ETA-10                  128-bit                 1.6e-22
>>Cray X/MP               128-bit                 4.2e-26
>>---------------------------------------------------------
>>
>>(1) The IBM has no extra error in the 32-bit case, but
>>    loses a factor of 5 (about 2 bits) in the 64-bit case.
>>    This is still a factor of almost 5 better than the
>>    results from the IBM mainframe hex-exponent perversity.
>>(2) The Cray and ETA-10 have larger exponent fields for
>>    64-bit numbers than the other formats.  This accounts
>>    for several bits of the difference.  The trouble with
>>    the ETA-10 format is discussed in my article in
>>    Supercomputer, issue 24.
>>
>>John D. McCalpin - mccalpin@vax1.acs.udel.edu
>>                   mccalpin@delocn.udel.edu
>>                   mccalpin@scri1.scri.fsu.edu
>>--
>>John D. McCalpin - mccalpin@vax1.acs.udel.edu
>>                   mccalpin@delocn.udel.edu
>>                   mccalpin@scri1.scri.fsu.edu
>>
>>
>>

    Needless to say this has caused quite a stir here. :-(
I brought this to the attention of our resident Floating-point math
guru, Dr. Peter Markstein of IBM Yorktown research, who was one of the
original 801 (first RISC) team member.


   Lately, reports of RMS errors from LINPACK have been circulating
in comp.arch.  The reports show that the IBM RiscSystem/6000 gets a
larger RMS error than the SUN3 using standard double precision IEEE
arithmetic.  How can this be?
   Computing residuals is a very difficult task.  To do it well, it must
be done in a greater precision than the precision used to solve the
equations.  The reason for this is that in computing residuals, there
is alot of cancellation of significant bits.  After all, if we were
computing with infinite precision, the addition of the very last
term of the inner product used to compute the residual would produce
a zero (cancelling all the digits).  We can say very little about the
residuals that the machines report using their own
maximum precision arithmetic.
   However, we observe that a machine with standard IEEE arithmetic can
very easily understate the size of the residual.  Attached below are
two programs:  the first solves a trivial set of equations and then
computes the RMS of the residuals.  Indeed, the IBM '6000 gets a larger RMS
than does a standard IEEE machine using the same precision.  But the IBM '6000
computes each term of the residual vector EXACTLY, so there is no doubt
as to which computation is more trustworthy.  The second program works
explicitly with the components of the residuals of the first.  It
computes the reciprocals of all the integers between 1 and 1000.  It
then reports the reciprocal to be exact if i * (1/i) - 1.0 is exactly 0.0.
Now we KNOW that that i * (1/i) computes to EXACTLY 1.0 using floating
point arithmetic only when i is a power of two.  Yet, on a standard IEEE
machine, the second program produces far more than 10 lines of output!
   To get back to the accuracy of residuals, one method of solving equations
in the days that it was done by desk calculator was to first solve
    AX = B
getting an approximate solution X'.  The residual was computed as
   R = B - AX', with the arithmetic done in double the usual precision
and the residual R then rounded to standard precision.  A correction term
was then computed by solving AY = R, yielding an approximate solution Y'.
Then X' + Y' was an improved solution, usually giving full precision accuracy.
This method IS NOT used in computers because, using full precision floating
point, the residual R has zero significant bits.  Using it to refine the
solution does no good at all.  Only if R can be computed by machine the
way it was on paper, with extra precision, is iterative refinement useful.
On most computers, this is simply too slow.  On the '6000, however,
the residual CAN be computed with reasonable effort, and iterative
refinement may make a comeback as a useful computational technique.
    Student
    School of Hard Knocks

C   Program 1 - Solving a trivial set of linear equations
         implicit real*8(a-h,o-z)
         dimension a(1000), x(1000), b(1000)
C   This program solves the set of 1000 equations AX = B
C   for the very simple case in which A is the diagonal
C   matrix with A(i,i) = i, and B(i) = 1.  Of course, we
C   know that the result is X(i) = 1/i, as best that 1/i
C   can be represented.  Since we all are IEEE machines,
C   we will all get the same results for X.
C      Then we compute the residuals and sum the squares.
C   We know that most of the X(i) are not exact, because
C   only powers of two have reciprocals that can be repre-
C   sented exactly in binary.  So the sum of the squares
C   of the residuals should be non-zero.  The IBM
C   RiscSystem/6000 computes all the residuals exactly.
C   However, a standard IEEE machine will report zero
C   for many of the components of the residual vector,
C   and thus reports a lower RMS!
        do 1 i = 1, 1000
           a(i) = i
           b(i) = 1.0
1          x(i) = b(i) / a(i)
C   The above loop set up the equations and solved them
C   Now we compute the residuals and sum the squares.
       sum = 0.0
       do 2 i = 1, 1000
2         sum = sum + (b(i) - a(i) * x(i))**2
C   Now we get the RMS ...
       rms = Sqrt(sum / 1000.0)
       write(*,10)rms
10     format(' The RMS for solving nx = 1, n = 1...1000, is',e15.7)
       stop
       end

C   Program 2 -- Determining if reciprocals have an exact representation
           implicit real*8(a-h,o-z)
           do 100 i = 1 , 1000
C             We will compute reciprocals of the first 1000
C             integers, and report the reciprocal to be exact
C             if the product of the integer and its computer
C             reciprocal are exactly 1.0
              x = i
              r = 1.0/x
              if ((x*r - 1.0) .eq. 0) write(*,10)i
100           continue
              stop
10    format(' The integer ', i4, ' has an exact reciprocal in binary')
      end


                              Peter Markstein
   To Reply directly to Dr. Markstein his Email Address is:
   USA InterNet: 
uunet!cs.utexas.edu!ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn
       or         ...@cs.utexas.edu:ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn

><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
 IBM Corp. - Adv. Workstation Div
  Troy N. Hicks                                      IBM Tieline:  678-7318
   Mail Stop: ZIP 4359                                USA Phone: (512)838-7318
   Austin, Texas  78758
   IBM VNet:  TROYHIX at AUSVM9        IBM InterNet:
troy@troyhix.austin.ibm.com
   USA InterNet:  uunet!cs.utexas.edu!ibmaus!auschs!troyhix.austin.ibm.com!troy
       or         ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy
              /*   I DO NOT speak for IBM, only for MYSELF   */
><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><