Path: utzoo!attcan!uunet!ubvax!ames!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: Standard Un*x H/W architecture
Message-ID: <12005@ames.arc.nasa.gov>
Date: 19 Jul 88 15:21:19 GMT
References: <980@garth.UUCP> <76700037@p.cs.uiuc.edu>
Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 51

In article <76700037@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes:
>I once heard an expert on floating-point arithmetic state that CDC's
>1's complement arithmetic was used JUST BECAUSE IT RUNS FASTER.  In
>fact, the engineer estimated their 1's complement could always be
>implemented to run 10% faster than IEEE arithmetic.  Since

CDC/ETA uses 2's complement on the Cyber 200/ETA-10 series of machines,
although they do not use IEEE.  So, you can teach an old dog new tricks,
sometimes.

There is a second argument that has come up frequently enough to
warrant an discussion:  

IF a particular program gets correct answers
using the IEEE standard, and incorrect answers using a less robust
format such as Cray's current format, is it better to get the
wrong answer 10% faster?  This is, in fact, the problem that
Kahan has been trying to address for the last decade.  More
realistically, it would be interesting to compare the rate of 
convergence of common iterative algorithms using IEEE, VAX, Cray,
ETA, and IBM arithmetic (to name some common formats) and see if
IEEE is significantly better.  Naturally, it would be allowable 
to rewrite the code completely in each case, so as to deal with
the error detection and recovery features (or lack thereof)
of each arithmetic.

I did not bring this up in my previous postings, but, eventually,
when supercomputers are using IEEE also, it will be an added
benefit that you will get the same behavior on all systems on
the same program.  I don't know how many people reading this
have run into this problem, but, I have seen many programmer
hours wasted trying to figure out why a particular algorithm
converged on one machine and diverged on another.

Getting back to the original argument, there are plenty of
cases involving graphics where it is more expensive to convert
the data between different formats than to have accepted
slightly lower performance generating the data and not have
to pay the price of conversion.

Finally, I understand that handling the IEEE gradual underflow
behavior can add an extra cycle of latency.  I also have
observed that the MIPS R2010 FPA (and maybe the new R3010 also)
can do a floating add in 2 (!) clock cycles.  How did they do
that?

-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117