Xref: utzoo comp.arch:7611 comp.lang.fortran:1639
Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!cs.utexas.edu!ut-emx!reeder
From: reeder@ut-emx.UUCP (William P. Reeder)
Newsgroups: comp.arch,comp.lang.fortran
Subject: Re: Quadruple-Precision Floating Point ?
Summary: not necessarily in hardware
Keywords: REAL*16 hardware
Message-ID: <8899@ut-emx.UUCP>
Date: 20 Dec 88 18:49:56 GMT
References: <8561@alice.UUCP> <3688@s.cc.purdue.edu>
Organization: University of Texas Computation Center
Lines: 40

In article <3688@s.cc.purdue.edu> ags@s.cc.purdue.edu (Dave Seaman) writes:
>In article <8561@alice.UUCP> wcs@alice.UUCP (Bill Stewart, usually) writes:
>>Are there any machines that implement quad-precision (128-bit) floating
>>point numbers in hardware?
>Basically all of the CDC, ETA, and Cray machines support 128-bit floating
>point numbers, but it is called double precision, not quad precision.
>Dave Seaman	ags@j.cc.purdue.edu
 
Sure, they support them, but would you say they support them "in hardware"?

I have used/programmed both a CDC 170/750 and a Cray X/MP-24 (in FORTRAN
and in assembly).

The CDC machine had a 96-bit accumulator used by all f.p. instructions.
Some instructions (FXi) performed an arithmetic operation and returned the
upper 48-bits of the result, others (DXi) returned the lower 48-bits.  The
operands were always single-precision floating-point values (in
registers).  So, for example, to get the double precision sum of two
(single-precision) values required two instructions, an FXi and a DXi.
Unfortunately, when I think of double-precision I expect that I should be
able to (for example) add two double-precision operands and get a double
precision result.  To do this on the CDC would require a function or
subroutine, it is not provided *in hardware*.

The X/MP is even worse, it does not even have the equivalent of the DXi
instructions found on the CDC's.  As a result, double-precision
computations are done entirely in software (meaning without the benefit of
any special dp hardware) and are anywhere from 30 to 90 times slower than
the corresponding single-precision operations.  This is probably the
reason dp operations can't be vectorized (as someone pointed out in an
earlier posting).

What about newer Crays?  Or the 2?

William Reeder
University of Texas
Computation Center
reeder@emx.utexas.edu
-- 
DISCLAIMER:	I speak only for myself, and usually only to myself.