Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!samsung!usc!wuarchive!udel!udccvax1!mccalpin
From: mccalpin@vax1.acs.udel.EDU (John D Mccalpin)
Newsgroups: comp.arch
Subject: Re: RS6000 Multiply/Accumulate instruction
Summary: LINPACK random numbers are the same on all machines
Keywords: RS6000, floating-point multiply, fp add
Message-ID: <5900@udccvax1.acs.udel.EDU>
Date: 21 Mar 90 12:29:34 GMT
References: <5827@udccvax1.acs.udel.EDU> <8907@boring.cwi.nl>
Organization: College of Marine Studies, Univ. of Delaware
Lines: 67

I wrote about the RMS error in the solution of the LINPACK 1000x1000
system of equations on the new IBM RS/6000 vs various other machines.

The RMS error that I have been posting is based on the fact that the
matrix is chosen in such as way that the solution consists of 1000
identical elements equal to 1.0d0.  It does this by generating a 
pseudo-random matrix and then re-scaling by the row sums (I think this
is how it was done).

In article <8907@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
> The problem I see is that to compare arithmetic properties of processors
> you must use *identical* input data on which to perform the operations.
> When using random numbers from the system supplied random number generator
> you cannot be sure of this (and in general it is not true). 

The pseudo-random numbers are generated by the LINPACK test code and
are identical on all machines.  The two errors that I was referring to
are (1) the error in calculating the sum of each row, and (2) the
error in the LU-decomposition and back-substitution.

The code is:

      subroutine matgen(a,lda,n,b,norma)
      double precision a(lda,1),b(1),norma
      init = 1325
      norma = 0.0
      do 30 j = 1,n
         do 20 i = 1,n
            init = mod(3125*init,65536)		! psuedo-random number
            a(i,j) = (init - 32768.0)/16384.0	! generator
            norma = dmax1(a(i,j), norma)	! infinity norm of matrix
   20    continue
   30 continue
      do 35 i = 1,n
          b(i) = 0.0
   35 continue
      do 50 j = 1,n
         do 40 i = 1,n
            b(i) = b(i) + a(i,j)		! RHS=row sums of Matrix
   40    continue
   50 continue
      end

I have used the subroutine DGECO from LINPACK to estimate the condition
number of this matrix and it is about 200,000, as I recall.

> So my conclusion still is that the difference in errors between the IBM and
> Sun/IEEE implementations are within the noise.  It may be there is indeed
> some flaw in the IBM implementation, but that requires more convincing
> figures.
> -- 
> dik t. winter, cwi, amsterdam, nederland
> dik@cwi.nl

I have been very careful not to call the IBM implementation flawed.
I have merely been pointing out that on identical calculations, the
answers differ from an IEEE calculation.  My experience has been that
even the direction of the error (i.e. better or worse answers) is
data-dependent.  

I am still trying to get the machine to run with the multiply/add
function separated, but the new O/S that they put on the FSU machine
seems to be broken....
-- 
John D. McCalpin                               mccalpin@vax1.acs.udel.edu
Assistant Professor                            mccalpin@delocn.udel.edu
College of Marine Studies, U. Del.             mccalpin@scri1.scri.fsu.edu