Path: utzoo!attcan!uunet!crdgw1!uakari.primate.wisc.edu!samsung!cs.utexas.edu!oakhill!chinds
From: chinds@oakhill.UUCP (Chris Hinds)
Newsgroups: comp.arch
Subject: Re: 68040 where is it?
Message-ID: <3812@wtkatz.oakhill.UUCP>
Date: 12 Sep 90 21:05:47 GMT
References: <1477@marlin.NOSC.MIL>
Distribution: comp.arc
Organization: Motorola Inc. Austin, Tx
Lines: 43

aburto@marlin.NOSC.MIL (Alfred A. Aburto) writes:
>Dave,
>The Weitek has other advantages over the 68040.  No doubt the Weitek uses
>64-bit (or there abouts) registers for general purpose operations such
>binary arithmetic shifts and adds.  These types of operations are necessary
>for example in using the CORDIC algorithm to approximate sin, cos, sincos,
>exp, log, asin, acos, atan, sinh, and cosh.  The 040 is limited to 32-bit
>binary shifts and adds. 

>I got a feeling Dave that the 040 will do the transcendental functions with
>ieeesp (32-bit) real fast (probably quicker and just as accurate as the 68882
>at the same clock), but the ieeedp (64-bits) is going to require a bit of
>igenuity (magic) me thinks.  Just my opinion at this time.....

Alfred,

A little bit more information for your opinion...

You are correct in that the 040 is only capable of 23-bit binary shifts and 
adds, etc. in integer code.  However, the algorithms chosen for transcendental 
emulation make use primarily of the FPU on the 040, and, like the 68882, 
are done completly in IEEE extended precision arithmetic.  Accuracy guaranteed 
within the same bounds, and as fast as a 33-MHz 68030 system with a 68882
coprocessor for fpu support.  So your comment about the speed of single being 
different from double is not accurate.  The 040 FPU is optimized for double, 
but, with emulation code, that will disappear, and the computation will take
equal time for all sizes of IEEE floating-point formats supported.

The 040 takes an unimplemented instruction trap on all transcendental 
instructions, so part of the time to process the instruction is overhead
of the trap, stack, etc.  The same emulation code, if used as a library,
would be faster by as much as 33% over the trap and emulate mode.  

Chris

*************************************************
*   Motorola Microprocessor Products Sector     *
*   Austin, Tx                                  *
*                                               *
*   Chris N. Hinds <><      Standard Disclamers *
*   oakhill!wtkatz!chinds@cs.utexas.edu         * 
*	chinds@oakhill.sps.mot.com					*
*************************************************