Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!unmvax!gatech!udel!nelson
From: nelson@udel.EDU (Mark Nelson)
Newsgroups: comp.arch
Subject: Re: Quadruple-Precision Floating Point ?
Keywords: REAL*16 hardware
Message-ID: <6053@louie.udel.EDU>
Date: 20 Dec 88 20:53:50 GMT
References: <8561@alice.UUCP> <3688@s.cc.purdue.edu> <285@loligo.fsu.edu>
Sender: usenet@udel.EDU
Reply-To: nelson@udel.EDU (Mark Nelson)
Organization: University of Delaware
Lines: 22

In article <285@loligo.fsu.edu> mccalpin@masig1.ocean.fsu.edu (John D. McCalpin) writes:
>
>Although it is true that CDC/ETA and Cray machines have the ability to 
>do 128-bit arithmetic, you probably don't want to do it on these machines.
>The relative speed of 64-bit vector instructions vs 128-bit instructions
>(which cannot currently be vectorized) is typically very close to 100:1
>on both the CDC/ETA and Cray machines.  Of course, if your code is all
>scalar anyway, then the performance degradation will not be so severe.
>
If I remember correctly from my days at CDC working with the 205, the
floating point hardware actually produced 128 bit results.  There were
different versions of the floating point instructions that returned
different (64-bit) parts of this 128 bit result.  The normal instructions
used were the ones that returned a normalized 64 bit result, but for
add and multiply multiply (maybe divide ?) there were instructions to
return either the most significant or the least significant 64 bits.
I am quite sure that these instructions existed in both scalar and
vector forms.  These instructions would allow you to add 128 bit
numbers with four additions, and this certainly vectorizes.

Mark Nelson
nelson@dewey.udel.edu