Path: utzoo!mnetor!uunet!steinmetz!sunset!oconnor
From: oconnor@sunset.steinmetz (Dennis M. O'Connor)
Newsgroups: comp.arch
Subject: Re: Performance increase - a suggestion
Message-ID: <9408@steinmetz.steinmetz.UUCP>
Date: 3 Feb 88 18:49:18 GMT
References: <3127@phri.UUCP>
Sender: news@steinmetz.steinmetz.UUCP
Reply-To: sunset!oconnor@steinmetz.UUCP
Organization: GE Corporate R&D Center
Lines: 59

An article by colwell@m6.UUCP (Robert Colwell) says:
] I don't believe you can do double precision math as fast as single
] precision math if both are implemented in the same technology.  If
] we're including division and sqrts, derived via a high-radix iterative
] procedure, it's certainly not true, since you get only one or two
] bits of mantissa per trip through the ALU.  In that case the time
] to solution is proportional to the number of bits of result you want.

NO. Sorry. But the fastest floating point division and root
algorithms use Newton-Rapheson iteration, where the time to
solution is proportional to log2( number_of_bits_of_result ).
That is, if single-precision takes 4 iterations, double precision
will take 5, and quad will take 6.

] If we're only discussing addition, subtraction, and multiplication,
] then I still don't believe it.  There's an adder at the heart of each
] of those, and its width decides its speed -- the wider, the slower
] (more levels of carry-lookahead).  If your choice is between making
] one engine to do both single and double precision (or dbl and quad),
] or making only dbl (quad), then I think the engine that has less to
] do can be made slightly faster.

Doubling the width of a carry-select adder adds one gate delay to
its latency. Carry-lookahead schemes have a similar less-than-
proportionate penalty for speed up. Subtract is just an add.
Floating point adds also need de- and re-normalization.

Multipliers are more complex : all the fast stuff uses big
booth-encoded adder arrays. But the final carry-resolution,
the justifiaction and normalization logic add signicantly to
the latency of the multiply.

] My personal perception of the market for scientific computation is
] that given a choice between more precision and more speed, speed
] wins hands down.

Anyone wanna buy a high-speed FPU for 16-bit ( 5 bit exponent,
10 bit fraction, 1 bit sign ) reals ? I didn't think so.
It doesn't matter how fast you are if your answer is wrong.
For many problems being tackled today, single-precision
introduces too much error. So it is not used, regardless
of how much faster it is.

Besides, double-precision ( 64 bit ) floating multiply and add
should only take about 20% longer than single-precision multiply
and add. DSAme for division and square roots.

] Bob Colwell
] Multiflow Computer  <Above is personal opinion only!!] 

Bob, you seem a little naive about floating-point hardware.
It's not all shift-add or shift-compare-subtract anymore.


--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"