Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!WATSON.IBM.COM!jbs
From: jbs@WATSON.IBM.COM
Newsgroups: comp.arch
Subject: Re: IEEE arithmetic
Message-ID: <9106190252.AA29755@ucbvax.Berkeley.EDU>
Date: 19 Jun 91 01:25:17 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 81


         Dik Winter said:
The origin of the discussion was a remark that interval arithmetic in software
had been observed to be 300 times as slow as standard hardware floating point.
While interval arithmetic had been observed to be 3 times as slow.  Shearer
questioned the number 3; I thought he questioned the order of magnitude but
apparently he wants to know the exact number.  Of course that can not be
given as it is entirely machine dependent.  Also the factor would be a function
of the mix of operations performed.

         I believe the original remark referred to estimates not observa-
tions.  I questioned in passing whether 3x was a realistic estimate.  I
continue to believe it is extremely optimistic and that 10x slower would
be more reasonable.
         Dik Winter:
About the multiplication routine, JBS:
 >      1.  Handling this by a subroutine call would seem to require 4
 > floating loads to pass the input arguments and 2 floating loads to re-
 > trieve the answers.  This is already expensive.
But you have in most cases to load these 4 numbers anyhow, so why would this
be more expensive?  Why loads to retrieve the answers?  Why not just let
them sit in the FP registers?

         If you are doing your operations memory to memory then this is
correct.  However if you are keeping intermediate results in registers
as will often be the case, then you must move your 4 operands from the
registers they are in to the registers the subroutine expects them in
and you must move your 2 answers out of the return registers into the
registers you want them in (if you just let them sit they will be wiped
out by the next call).  In general it will be possible to avoid doing
some of this movement but I don't see how to avoid all of it.
         Dik Winter:
Moreover, never it was said that there is an exact factor of 3 involved;
that factor was simply observed, for a set of programs.
 >      3.  Do you know of any machine where the above code will average
 > 3x (or less) the time of a single multiply?
So this is irrelevant.

         Who observed this?  Under what conditions?
         Dik Winter:
 >           On another topic Dik Winter said:
 > But you can get reasonable results without any pivoting if the condition
 > is very good!
I should have added the context, where not only the condition of the complete
matrix is very good, but also of all its principal minors.

         This still isn't right.  Consider e 1  e small (but not zero).
                                           1 e
         Dik Winter (in a later post):
I am *not* an advocate for interval arithmetic (the people at Karlsruhe are).
I do not use it.  But I object to the way Shearer handles this:
a.  Shearer asks: what is the justification for the different rounding modes.
b.  Many responses come: interval arithmetic.
c.  Shearer asks: would it not be better helped with quad arithmetic?
d.  Response:  observed speed difference a factor 3 with hardware rounding
        modes, a factor 300 in software.
e.  Shearer questions the factor 3.  Apparently he believes the factor 300
        (does he?).
Even if the factor 3 would degrade on other machines to a factor of 5 or even
10, the difference with 300 is still striking.

I ask Shearer again: come with an interval add assuming the base arithmetic
is round to nearest only (or even worse, with truncating arithmetic, which
you advocate in another article).

         Some comments:
         Regarding b if interval arithmetic is the only reason for the
different rounding modes then I think they may safely be junked.
         Regarding c what I actually said was I thought quad precision
would provide some support for interval arithmetic (not better support).
In any case upon further reflection I will withdraw this statement.
         Regarding d as I said above I believe these were estimates.
         Regarding e I don't really believe the 300x either and I part-
icularly don't believe the 100x ratio.
         Regarding how I would implement interval arithemtic it is not
particularly difficult to do without the rounding modes as long as you
don't insist on maintaining the tightest possible intervals.  There is
no great loss in being a little sloppy since an extra ulp only matters
for narrow intervals and the main problem with interval arithmetic is
that the intervals don't stay narrow even if you are careful.
                       James B. Shearer