Path: utzoo!attcan!uunet!ncrlnk!ncrcae!hubcap!gatech!gitpyr!loligo!mccalpin From: mccalpin@loligo.fsu.edu (John McCalpin) Newsgroups: comp.arch Subject: Re: Quadruple-Precision Floating Point ? Summary: it "sort of" vectorizes Keywords: REAL*16 hardware Message-ID: <292@loligo.fsu.edu> Date: 21 Dec 88 13:13:03 GMT References: <8561@alice.UUCP> <3688@s.cc.purdue.edu> <285@loligo.fsu.edu> <6053@louie.udel.EDU> Reply-To: mccalpin@loligo.UUCP (John McCalpin) Organization: Supercomputer Computations Research Institute Lines: 25 I previously wrote that 128-bit floating-point arithmetic did not vectorize on the Cyber 205/ETA-10. In response: In article <6053@louie.udel.EDU> nelson@udel.EDU (Mark Nelson) writes: >If I remember correctly from my days at CDC working with the 205, >...(t)here were different versions of the floating point instructions >that returned different (64-bit) parts of this 128 bit result. ...stuff deleted... >These instructions would allow you to add 128 bit >numbers with four additions, and this certainly vectorizes. >Mark Nelson >nelson@dewey.udel.edu The above is mostly correct. The multiply/add upper and lower instructions do exist in vector forms, but a lot more than 4 are required to get the complete result. Daan Sandee of CDC and SCRI has vectorized the linked triad operation in 128 bits, and I believe that it required about 30 passes through the pipe, as opposed to 1 pass for the 64-bit linked triad. If this recollection is correct, then the 128-bit can be as fast as 1/30 of the 64-bit speed, rather than the 1/100 which is typically obtained using scalar code. Clarifications, Daan ? John D. McCalpin mccalpin@masig1.ocean.fsu.edu mccalpin@fsu (BITNET or MFENET)