Path: utzoo!attcan!uunet!ncrlnk!ncrcae!hubcap!gatech!gitpyr!loligo!mccalpin
From: mccalpin@loligo.fsu.edu (John McCalpin)
Newsgroups: comp.arch
Subject: Re: Quadruple-Precision Floating Point ?
Summary: it "sort of" vectorizes
Keywords: REAL*16 hardware
Message-ID: <292@loligo.fsu.edu>
Date: 21 Dec 88 13:13:03 GMT
References: <8561@alice.UUCP> <3688@s.cc.purdue.edu> <285@loligo.fsu.edu> <6053@louie.udel.EDU>
Reply-To: mccalpin@loligo.UUCP (John McCalpin)
Organization: Supercomputer Computations Research Institute
Lines: 25

I previously wrote that 128-bit floating-point arithmetic did not 
vectorize on the Cyber 205/ETA-10.  In response:

In article <6053@louie.udel.EDU> nelson@udel.EDU (Mark Nelson) writes:
>If I remember correctly from my days at CDC working with the 205,
>...(t)here were different versions of the floating point instructions
>that returned different (64-bit) parts of this 128 bit result.
	...stuff deleted...
>These instructions would allow you to add 128 bit
>numbers with four additions, and this certainly vectorizes.
>Mark Nelson
>nelson@dewey.udel.edu

The above is mostly correct.  The multiply/add upper and lower instructions
do exist in vector forms, but a lot more than 4 are required to get 
the complete result.  Daan Sandee of CDC and SCRI has vectorized the
linked triad operation in 128 bits, and I believe that it required
about 30 passes through the pipe, as opposed to 1 pass for the 64-bit
linked triad.  If this recollection is correct, then the 128-bit can 
be as fast as 1/30 of the 64-bit speed, rather than the 1/100 which is
typically obtained using scalar code.  Clarifications, Daan ?

John D. McCalpin
mccalpin@masig1.ocean.fsu.edu
mccalpin@fsu	(BITNET or MFENET)