Path: utzoo!mnetor!uunet!steinmetz!sunset!oconnor From: oconnor@sunset.steinmetz (Dennis M. O'Connor) Newsgroups: comp.arch Subject: Re: Performance increase - a suggestion Message-ID: <9408@steinmetz.steinmetz.UUCP> Date: 3 Feb 88 18:49:18 GMT References: <3127@phri.UUCP> Sender: news@steinmetz.steinmetz.UUCP Reply-To: sunset!oconnor@steinmetz.UUCP Organization: GE Corporate R&D Center Lines: 59 An article by colwell@m6.UUCP (Robert Colwell) says: ] I don't believe you can do double precision math as fast as single ] precision math if both are implemented in the same technology. If ] we're including division and sqrts, derived via a high-radix iterative ] procedure, it's certainly not true, since you get only one or two ] bits of mantissa per trip through the ALU. In that case the time ] to solution is proportional to the number of bits of result you want. NO. Sorry. But the fastest floating point division and root algorithms use Newton-Rapheson iteration, where the time to solution is proportional to log2( number_of_bits_of_result ). That is, if single-precision takes 4 iterations, double precision will take 5, and quad will take 6. ] If we're only discussing addition, subtraction, and multiplication, ] then I still don't believe it. There's an adder at the heart of each ] of those, and its width decides its speed -- the wider, the slower ] (more levels of carry-lookahead). If your choice is between making ] one engine to do both single and double precision (or dbl and quad), ] or making only dbl (quad), then I think the engine that has less to ] do can be made slightly faster. Doubling the width of a carry-select adder adds one gate delay to its latency. Carry-lookahead schemes have a similar less-than- proportionate penalty for speed up. Subtract is just an add. Floating point adds also need de- and re-normalization. Multipliers are more complex : all the fast stuff uses big booth-encoded adder arrays. But the final carry-resolution, the justifiaction and normalization logic add signicantly to the latency of the multiply. ] My personal perception of the market for scientific computation is ] that given a choice between more precision and more speed, speed ] wins hands down. Anyone wanna buy a high-speed FPU for 16-bit ( 5 bit exponent, 10 bit fraction, 1 bit sign ) reals ? I didn't think so. It doesn't matter how fast you are if your answer is wrong. For many problems being tackled today, single-precision introduces too much error. So it is not used, regardless of how much faster it is. Besides, double-precision ( 64 bit ) floating multiply and add should only take about 20% longer than single-precision multiply and add. DSAme for division and square roots. ] Bob Colwell ] Multiflow Computer