Xref: utzoo comp.lang.fortran:4234 comp.lang.c:34419 Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!boingo.med.jhu.edu!haven!adm!cmcl2!lanl!ttw From: ttw@lanl.gov (Tony Warnock) Newsgroups: comp.lang.fortran,comp.lang.c Subject: Re: Fortran vs. C for numerical work (SUMMARY) Message-ID: <7445@lanl.gov> Date: 30 Nov 90 15:21:09 GMT References: <1990Nov29.040910.7400@kithrup.COM> <7318@lanl.gov> <6493:Nov3006:03:3790@kramden.acf.nyu.edu> Organization: Los Alamos Natl Lab, Los Alamos, N.M. Lines: 55 >From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) > >In article <7318@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> The Crays have an integer multiply unit for addresses. This mult >> takes 4 clocks. > >But isn't that only for the 24-bit integer? If you want to multiply full >words you have to (internally) convert to floating point, multiply, and >convert back. > >I have dozens of machines that can handle a 16MB computation; I'm not >gonig to bother with a Cray for those. The biggest advantage of the Cray >line (particularly the Cray-2) is its huge address space. > >So what's the actual time for multiplying integers? >---Dan The time for multiplying 32-bit integers on the YMP is 5 clock periods. Normally YMP addresses are interpreted as 64-bit words not as bytes. On the previous models of CRAYS, 24 bits are used to address 16Mwords not Mbytes. (This saves 3 wires per address data path As most work on CRAY's is done on words (numerical) or packed-character strings, multiplication of longer integers is not provided for in the hardware. Personally I would like to have long integer support. The CRAY architecture supports a somewhat strange multiplication method which will yield a 48-bit product of the input words have total length less than 48 bits. That is, one can multiply two 24-bit quantities, a 16-bit and a 32-bit quantity, a 13-bit and a 35-bit quantity, or shorter things. This operation takes two shifts and one multiply. The shifts may be overlapped so the time is 3 clocks for the two shifts and 7 clocks for the multiply if the shifts are known; or 4 clocks for the shifts and 7 clocks for the multiply if the shifts are variable. Its a bit of a pain to program but the compiler does for us. Another form of integer multiplication is used sometimes: the integers are converted to floating, then multiplied, and the result converted back to integer. This method fails if an intermediate value exceeds 46-bits of significance. The time is 2 clocks for producing a "magic" constant, 3 clocks each for two integer adds (reduces to 4 total because of pipelining), 6 clocks each for two floating adds (reduces to 6 because of pipelining overlap with the integer add), 7 clocks for the floating multiply, 6 clocks for another floating add, and 6 clocks for another integer multiply. Total is 29 clocks if no other operations may be pipelined with these operations. If the quantities being multiplied are addresses, some of the above is eliminated, bringing the result down to 20 clocks. Still this is not as good as the floating point performance. All of the above may be vectorized which would result in 3 clocks per result in vector mode. Brought to you by Super Global Mega Corp .com