Xref: utzoo comp.arch:7611 comp.lang.fortran:1639 Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!cs.utexas.edu!ut-emx!reeder From: reeder@ut-emx.UUCP (William P. Reeder) Newsgroups: comp.arch,comp.lang.fortran Subject: Re: Quadruple-Precision Floating Point ? Summary: not necessarily in hardware Keywords: REAL*16 hardware Message-ID: <8899@ut-emx.UUCP> Date: 20 Dec 88 18:49:56 GMT References: <8561@alice.UUCP> <3688@s.cc.purdue.edu> Organization: University of Texas Computation Center Lines: 40 In article <3688@s.cc.purdue.edu> ags@s.cc.purdue.edu (Dave Seaman) writes: >In article <8561@alice.UUCP> wcs@alice.UUCP (Bill Stewart, usually) writes: >>Are there any machines that implement quad-precision (128-bit) floating >>point numbers in hardware? >Basically all of the CDC, ETA, and Cray machines support 128-bit floating >point numbers, but it is called double precision, not quad precision. >Dave Seaman ags@j.cc.purdue.edu Sure, they support them, but would you say they support them "in hardware"? I have used/programmed both a CDC 170/750 and a Cray X/MP-24 (in FORTRAN and in assembly). The CDC machine had a 96-bit accumulator used by all f.p. instructions. Some instructions (FXi) performed an arithmetic operation and returned the upper 48-bits of the result, others (DXi) returned the lower 48-bits. The operands were always single-precision floating-point values (in registers). So, for example, to get the double precision sum of two (single-precision) values required two instructions, an FXi and a DXi. Unfortunately, when I think of double-precision I expect that I should be able to (for example) add two double-precision operands and get a double precision result. To do this on the CDC would require a function or subroutine, it is not provided *in hardware*. The X/MP is even worse, it does not even have the equivalent of the DXi instructions found on the CDC's. As a result, double-precision computations are done entirely in software (meaning without the benefit of any special dp hardware) and are anywhere from 30 to 90 times slower than the corresponding single-precision operations. This is probably the reason dp operations can't be vectorized (as someone pointed out in an earlier posting). What about newer Crays? Or the 2? William Reeder University of Texas Computation Center reeder@emx.utexas.edu -- DISCLAIMER: I speak only for myself, and usually only to myself.