Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!hp4nl!cwi.nl!dik
From: dik@cwi.nl (Dik T. Winter)
Newsgroups: comp.arch
Subject: Re: IEEE arithmetic
Message-ID: <3707@charon.cwi.nl>
Date: 15 Jun 91 22:23:08 GMT
References: <9106150258.AA16308@ucbvax.Berkeley.EDU>
Sender: news@cwi.nl
Organization: CWI, Amsterdam
Lines: 69

In article <9106150258.AA16308@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes
trying to sidetrack me again.

The origin of the discussion was a remark that interval arithmetic in software
had been observed to be 300 times as slow as standard hardware floating point.
While interval arithmetic had been observed to be 3 times as slow.  Shearer
questioned the number 3; I thought he questioned the order of magnitude but
apparently he wants to know the exact number.  Of course that can not be
given as it is entirely machine dependent.  Also the factor would be a function
of the mix of operations performed.

I gave an example of interval addition in four instructions and asked
why it would be more than 3x slower; forgetting heavily pipelined machines
that do not take along the current rounding mode state in the pipeline,
forcing a slow setting of the rounding mode (the pipe must be empty to
modify it).  If the current rounding mode is carried along in the pipe
there would be no problem with the setting of the rounding mode.

JBS:
 >           It will be at least 4 times slower on the Risc System 6000.
Granted.  The support for the particular rounding modes and changing them
on the fly is not particularly well supported on the RS6000.  There are
also enough machines where it will be less than 3 times slower.

About the multiplication routine, JBS:
 >      1.  Handling this by a subroutine call would seem to require 4
 > floating loads to pass the input arguments and 2 floating loads to re-
 > trieve the answers.  This is already expensive.
But you have in most cases to load these 4 numbers anyhow, so why would this
be more expensive?  Why loads to retrieve the answers?  Why not just let
them sit in the FP registers?
 >      2.  All the floating compares and branches will severely impact
 > performance on the Risc System 6000.  The slowdown will be much more
 > than 3x.
O yes, on the RS6000.  There are machines that will behave differently.
Moreover, never it was said that there is an exact factor of 3 involved;
that factor was simply observed, for a set of programs.
 >      3.  Do you know of any machine where the above code will average
 > 3x (or less) the time of a single multiply?
So this is irrelevant.

What has been forgotten is that the same stuff in software would be
extremely expensive.  JBS, can you show a reasonable interval add in
software?  And an interval multiply?

 >           On another topic Dik Winter said:
 > But you can get reasonable results without any pivoting if the condition
 > is very good!
I should have added the context, where not only the condition of the complete
matrix is very good, but also of all its principal minors.  Of course you
should never take a pivotal element that is too small.
 > 
 > This matrix 1+e 1-e (e small) has bad condition number and is not help-
 >             1-e 1+e           ed by pivoting.
Yes, the condition is 1/e.  Assume LDL' decomposition.  The relative errors
in L are small compared to the elements; we have to look at diagonal matrix D.
The second element is 4e/(1+e).  Given a machine precision t, the absolute
error in that element is (%) t.|(1-e)^2/(1+e)| for a relative error (%)
t.|(1-e)^2/4e|.  So if e>0 the relative error is smaller than if e<0 showing
that pivoting helps even here!  Slightly of course, because e is small, but
always with pivoting it does not matter so very much whether you pivot on the
largest element or an element reasonably close to it.

But I am already severely sidetracked again.
--
(%) bounded by.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl