Xref: utzoo comp.arch:12803 comp.lang.fortran:2764 Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!jarthur!uci-ics!ucla-cs!math.ucla.edu!sonia!pmontgom From: pmontgom@sonia.math.ucla.edu (Peter Montgomery) Newsgroups: comp.arch,comp.lang.fortran Subject: Re: Pipelined FP add (Separate Register Sets) Message-ID: <2100@sunset.MATH.UCLA.EDU> Date: 17 Dec 89 00:18:36 GMT References: <241@dg.dg.com> <33570@hal.mips.COM> <3740@brazos.Rice.edu> <38132@ames.arc.nasa.gov> <28413@amdcad.AMD.COM> Sender: news@MATH.UCLA.EDU Reply-To: pmontgom@math.ucla.edu (Peter Montgomery) Distribution: na Organization: UCLA Mathematics Department Lines: 41 In article <28413@amdcad.AMD.COM> davec@cayman.amd.com (Dave Christie) writes: > >Quite true, but (IMHO) for most applications, int<->fp traffic is not very >high, so having to explicitly move some data around is no big deal ... as >long as it is done reasonably efficiently so that you don't blow one or two >applications out of the water. > I have written a multiple precision integer arithmetic package. Its time-critical routines are conditionally compiled so the algorithms can be individually optimized for different machines (yes, I resemble Herman Rubin in wanting full access to machine instructions from high-level languages). On some machines, almost all arithmetic is integer. Recently, while transporting my program to an Alliant, I vectorized some of these codes. A frequent primitive operation requires finding integers X, Y such that X*2**30 + Y = A*B + C where A, B, C are given (vectors of) integers up to 2^30 (2^30 is the radix for multiple-precision arithmetic). The Alliant has an vectorized integer multiply instruction, but only for the lower 32 bits of a product. Hence I can get Y = IAND(A*B + C, 2**30 - 1) with vectorized integer operations. To get X (upper half of product) using vector operations, I use floating point, such as X = NINT((DBLE(A)*DBLE(B) + DBLE(C - Y))*0.5**30) (DBLE = convert integer to double, NINT = convert double to nearest integer). This statement uses 4 conversions between integer and floating point while doing only 3 floating point operations. The vectorized DBLE is compiled inline, but the vectorized NINT is done in a subroutine; the NINT subroutine uses 10% of my program's time (but total program time is down from before vectorization). BTW, I have asked the X3J3 committee to add this operation (i.e., given A, B, C, D, all nonnegative, and either A < D or B < D, find X, Y such that X*D + Y = A*B + C) to Fortran 8x. -------- Peter Montgomery pmontgom@MATH.UCLA.EDU Department of Mathematics, UCLA, Los Angeles, CA 90024