Path: utzoo!attcan!uunet!ubvax!ames!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Standard Un*x H/W architecture Message-ID: <12005@ames.arc.nasa.gov> Date: 19 Jul 88 15:21:19 GMT References: <980@garth.UUCP> <76700037@p.cs.uiuc.edu> Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 51 In article <76700037@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes: >I once heard an expert on floating-point arithmetic state that CDC's >1's complement arithmetic was used JUST BECAUSE IT RUNS FASTER. In >fact, the engineer estimated their 1's complement could always be >implemented to run 10% faster than IEEE arithmetic. Since CDC/ETA uses 2's complement on the Cyber 200/ETA-10 series of machines, although they do not use IEEE. So, you can teach an old dog new tricks, sometimes. There is a second argument that has come up frequently enough to warrant an discussion: IF a particular program gets correct answers using the IEEE standard, and incorrect answers using a less robust format such as Cray's current format, is it better to get the wrong answer 10% faster? This is, in fact, the problem that Kahan has been trying to address for the last decade. More realistically, it would be interesting to compare the rate of convergence of common iterative algorithms using IEEE, VAX, Cray, ETA, and IBM arithmetic (to name some common formats) and see if IEEE is significantly better. Naturally, it would be allowable to rewrite the code completely in each case, so as to deal with the error detection and recovery features (or lack thereof) of each arithmetic. I did not bring this up in my previous postings, but, eventually, when supercomputers are using IEEE also, it will be an added benefit that you will get the same behavior on all systems on the same program. I don't know how many people reading this have run into this problem, but, I have seen many programmer hours wasted trying to figure out why a particular algorithm converged on one machine and diverged on another. Getting back to the original argument, there are plenty of cases involving graphics where it is more expensive to convert the data between different formats than to have accepted slightly lower performance generating the data and not have to pay the price of conversion. Finally, I understand that handling the IEEE gradual underflow behavior can add an extra cycle of latency. I also have observed that the MIPS R2010 FPA (and maybe the new R3010 also) can do a floating add in 2 (!) clock cycles. How did they do that? -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117