Path: utzoo!mnetor!uunet!husc6!hao!gatech!mcnc!rutgers!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Performance increase - a suggestion Message-ID: <673@l.cc.purdue.edu> Date: 4 Feb 88 11:02:41 GMT References: <3127@phri.UUCP> <9408@steinmetz.steinmetz.UUCP> Organization: Purdue University Statistics Department Lines: 45 Summary: Not quite In article <9408@steinmetz.steinmetz.UUCP>, oconnor@sunset.steinmetz (Dennis M. O'Connor) writes: > An article by colwell@m6.UUCP (Robert Colwell) says: > ] I don't believe you can do double precision math as fast as single > ] precision math if both are implemented in the same technology. > ..... > NO. Sorry. But the fastest floating point division and root > algorithms use Newton-Rapheson iteration, where the time to > solution is proportional to log2( number_of_bits_of_result ). > That is, if single-precision takes 4 iterations, double precision > will take 5, and quad will take 6. You are essentially correct about the number of _iterations_. However, unless the accuracy of an iteration is built in in hardware, the cost of an iteration grows more quickly than the length of the operands. It is possible to design hardware such that the hardware "double precision" may not be too much slower than the hardware "single precision." However, to extend precision beyond that built into hardware, it is necessary that the number be broken into blocks such that operations yielding results of twice the number of bits in the unit for multiplication are available. In addition, floating point arithmetic adds so many problems, especially if normalized, that it is highly advisable to do the operations in integer arithmetic. To summarize, arithmetic whose precision is beyond the built-in is only somewhat expensive if the hardware is designed so that results of double the length of the built-in are available (i.e., if 52-bit mantissas are supported, 104-bit products must be available). In addition, this can be done with some difficulty if it has to be done in unnormalized floating-point, it is extremely difficult in normalized floating-point, and easiest in fixed-point. > Besides, double-precision ( 64 bit ) floating multiply and add > should only take about 20% longer than single-precision multiply > and add. DSAme for division and square roots. If additional "silicon real estate" is added, this is true. The chip area is doubled for addition, and probably a factor of 3-4 for multiplication. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet