Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!WATSON.IBM.COM!jbs
From: jbs@WATSON.IBM.COM
Newsgroups: comp.arch
Subject: massive linpack
Message-ID: <9106080254.AA12624@ucbvax.Berkeley.EDU>
Date: 8 Jun 91 02:57:58 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 83


          Richard Lethin asked:
>At what size N for massive linpack does the accumulated numerical fuzz
>swamp the precision available in a double precision number?
          I answered:
>         Very large.  Typical relative error (in the vector norm sense)
>in the solution is n*c*e, where n is matrix size, c is condition number
>and e is machine epsilon (2**-52 for IEEE double).  Therefore if you
>want a solution accurate to 1 part in 1000 (10 bits) and your matrix
>has condition number less than 1000 (10 bits) your n can be 2**30 or
>10**9.  This would require 10**18 storage and 10**27 ops to solve so
>is well beyond current machines.  Note once you have any bits in your
>answer you can obtain more by iterative improvement an order N*2 pro-
>cess.
          David Chase objects:
I don't have the time to prove that you're wrong here, but I think
that you are either wrong, or else being very misleading.  In general,
why should we believe you?  What do you usually do?  Goldberg and
Hough are experts, as are the people that I took numerical analysis
from too many years ago.  Are you an expert?  (I only know enough to
know when I should be suspicious, but I'm suspicious.)
First, you haven't said anything about worst case, which must be
worried about.  In addition, you haven't discussed a typical condition
number.
Second, I know that the NA people I once worked with were interested
in using exact arithmetic for error analysis. (There's a paper by
Boehm, O'Donnell, and Riggle in the 1986 Lisp & FP conference -- when
I say "exact", I mean an answer guaranteed to be within epsilon of the
true answer).  This means that double precision isn't good enough for
very accurate error analysis, though I don't know how they defined
"very accurate".  I suspect that the worst case bounds on error
estimation were not satisfactory.
Third, "LINPACK" is not a numerical analysis algorithm.  There is a
wide variety of algorithms in LINPACK, with varying stability and
cost.  If you don't specify the algorithm, how are we to know what you
are talking about?  Is it GE, GEPP, GEFP, QR, SVD, or Cholesky?  Band
matrix, or not?  The worst case on GEPP (which is what I think you
meant) is still pretty bad, though I've also heard (from an expert)
that it is almost never seen.  If (as I think you are saying) GEPP is
so good, then why do we bother with more stable algorithms like QR and
SVD?

          To clarify I was referring to the numerical properties of
Gaussian elimination with partial pivoting (GEPP) when solving a large
dense system of linear equations.  This is what the TPP (towards peak
performance) linpack benchmark does (on a system of size 1000).  I
presume the massive linpack benchmark is the same thing with larger
matrices. Someone queried whether the quoted results were for problems
which required pivoting which I agree is a good question.
          Anyone desiring further authority for my statements about
GEPP should be able to find it in any good mathematics library.  One
reference is the book "Numerical Methods and Software" (David Kahaner,
Cleve Moler and Stephen Nash, Prentice Hall, 1989).  The accuracy of
GEPP is discussed in chapter 3 (esp. p.66-p.67).  It is well known
there are certain pathological matrices for which GEPP does not perform
well numerically. These matrices have the property that in the process
of computing the factorization certain intermediate terms are formed
which are much larger than the elements in the original matrix.  These
terms spoil the error analysis.  If this does not happen then you are
certain to be ok. This problem is almost never seen in practice.  There-
fore my statement about typical behavior.
          If you are using an appropriate numerical method the condition
number generally will be determined by the stabilty of the physical
system you are modeling and not by the number of points in your discret-
ization.  If your problem uses real inexact data the condition number
best not be too large or your answer is going to be garbage regardless
of how accurately you compute it.
          QR and SVD are used for solving least squares problems.  Least
squares problems can be formulated as linear equations (the normal equa-
tions) but doing so unnecessarily increases the condition number.  This
is not a fault with GEPP but with the formulation.  (This is discussed
on p202-203 of the above referenced book.)
          As to credibility: I prefer to judge participants in the news-
groups by the quality of their posts rather than by their "credentials"
(or lack thereof).  However for those readers who care about credentials
I do have a few.  I have a PhD in mathematics from MIT.  Numerical anal-
ysis is not my field of specialization but I have been exposed to it
over the years, I have access to a mathematics library and I hopefully
have enough mathematical sophistication to avoid gross misstatements.
As to practical experience I have contributed to IBM's VS Fortran (ver-
sion 2) intrinsic function library and to IBM's Engineering and Scienti-
fic Subroutine Library (ESSL).
                         James B. Shearer