Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!hp4nl!cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.arch Subject: Re: IEEE arithmetic Message-ID: <3707@charon.cwi.nl> Date: 15 Jun 91 22:23:08 GMT References: <9106150258.AA16308@ucbvax.Berkeley.EDU> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 69 In article <9106150258.AA16308@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes trying to sidetrack me again. The origin of the discussion was a remark that interval arithmetic in software had been observed to be 300 times as slow as standard hardware floating point. While interval arithmetic had been observed to be 3 times as slow. Shearer questioned the number 3; I thought he questioned the order of magnitude but apparently he wants to know the exact number. Of course that can not be given as it is entirely machine dependent. Also the factor would be a function of the mix of operations performed. I gave an example of interval addition in four instructions and asked why it would be more than 3x slower; forgetting heavily pipelined machines that do not take along the current rounding mode state in the pipeline, forcing a slow setting of the rounding mode (the pipe must be empty to modify it). If the current rounding mode is carried along in the pipe there would be no problem with the setting of the rounding mode. JBS: > It will be at least 4 times slower on the Risc System 6000. Granted. The support for the particular rounding modes and changing them on the fly is not particularly well supported on the RS6000. There are also enough machines where it will be less than 3 times slower. About the multiplication routine, JBS: > 1. Handling this by a subroutine call would seem to require 4 > floating loads to pass the input arguments and 2 floating loads to re- > trieve the answers. This is already expensive. But you have in most cases to load these 4 numbers anyhow, so why would this be more expensive? Why loads to retrieve the answers? Why not just let them sit in the FP registers? > 2. All the floating compares and branches will severely impact > performance on the Risc System 6000. The slowdown will be much more > than 3x. O yes, on the RS6000. There are machines that will behave differently. Moreover, never it was said that there is an exact factor of 3 involved; that factor was simply observed, for a set of programs. > 3. Do you know of any machine where the above code will average > 3x (or less) the time of a single multiply? So this is irrelevant. What has been forgotten is that the same stuff in software would be extremely expensive. JBS, can you show a reasonable interval add in software? And an interval multiply? > On another topic Dik Winter said: > But you can get reasonable results without any pivoting if the condition > is very good! I should have added the context, where not only the condition of the complete matrix is very good, but also of all its principal minors. Of course you should never take a pivotal element that is too small. > > This matrix 1+e 1-e (e small) has bad condition number and is not help- > 1-e 1+e ed by pivoting. Yes, the condition is 1/e. Assume LDL' decomposition. The relative errors in L are small compared to the elements; we have to look at diagonal matrix D. The second element is 4e/(1+e). Given a machine precision t, the absolute error in that element is (%) t.|(1-e)^2/(1+e)| for a relative error (%) t.|(1-e)^2/4e|. So if e>0 the relative error is smaller than if e<0 showing that pivoting helps even here! Slightly of course, because e is small, but always with pivoting it does not matter so very much whether you pivot on the largest element or an element reasonably close to it. But I am already severely sidetracked again. -- (%) bounded by. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl