Path: utzoo!mnetor!uunet!husc6!hao!gatech!mcnc!rutgers!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: Performance increase - a suggestion
Message-ID: <673@l.cc.purdue.edu>
Date: 4 Feb 88 11:02:41 GMT
References: <3127@phri.UUCP> <9408@steinmetz.steinmetz.UUCP>
Organization: Purdue University Statistics Department
Lines: 45
Summary: Not quite

In article <9408@steinmetz.steinmetz.UUCP>, oconnor@sunset.steinmetz (Dennis M. O'Connor) writes:
> An article by colwell@m6.UUCP (Robert Colwell) says:
> ] I don't believe you can do double precision math as fast as single
> ] precision math if both are implemented in the same technology.  
>  .....
 
> NO. Sorry. But the fastest floating point division and root
> algorithms use Newton-Rapheson iteration, where the time to
> solution is proportional to log2( number_of_bits_of_result ).
> That is, if single-precision takes 4 iterations, double precision
> will take 5, and quad will take 6.
 
You are essentially correct about the number of _iterations_.  However,
unless the accuracy of an iteration is built in in hardware, the cost
of an iteration grows more quickly than the length of the operands.
It is possible to design hardware such that the hardware "double
precision" may not be too much slower than the hardware "single 
precision."  However, to extend precision beyond that built into
hardware, it is necessary that the number be broken into blocks such
that operations yielding results of twice the number of bits in the unit
for multiplication are available.  In addition, floating point arithmetic
adds so many problems, especially if normalized, that it is highly
advisable to do the operations in integer arithmetic.  

To summarize, arithmetic whose precision is beyond the built-in is 
only somewhat expensive if the hardware is designed so that results
of double the length of the built-in are available (i.e., if 52-bit
mantissas are supported, 104-bit products must be available).  In 
addition, this can be done with some difficulty if it has to be done
in unnormalized floating-point, it is extremely difficult in normalized
floating-point, and easiest in fixed-point.

> Besides, double-precision ( 64 bit ) floating multiply and add
> should only take about 20% longer than single-precision multiply
> and add. DSAme for division and square roots.

If additional "silicon real estate" is added, this is true.  The
chip area is doubled for addition, and probably a factor of 3-4
for multiplication.


-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet