Xref: utzoo comp.arch:12803 comp.lang.fortran:2764
Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!jarthur!uci-ics!ucla-cs!math.ucla.edu!sonia!pmontgom
From: pmontgom@sonia.math.ucla.edu (Peter Montgomery)
Newsgroups: comp.arch,comp.lang.fortran
Subject: Re: Pipelined FP add (Separate Register Sets)
Message-ID: <2100@sunset.MATH.UCLA.EDU>
Date: 17 Dec 89 00:18:36 GMT
References: <241@dg.dg.com> <33570@hal.mips.COM> <3740@brazos.Rice.edu> <38132@ames.arc.nasa.gov> <28413@amdcad.AMD.COM>
Sender: news@MATH.UCLA.EDU
Reply-To: pmontgom@math.ucla.edu (Peter Montgomery)
Distribution: na
Organization: UCLA Mathematics Department
Lines: 41

In article <28413@amdcad.AMD.COM> davec@cayman.amd.com (Dave Christie) writes:
>
>Quite true, but (IMHO) for most applications, int<->fp traffic is not very
>high, so having to explicitly move some data around is no big deal ... as
>long as it is done reasonably efficiently so that you don't blow one or two
>applications out of the water.
>
	I have written a multiple precision integer arithmetic package.
Its time-critical routines are conditionally compiled so
the algorithms can be individually optimized for different machines 
(yes, I resemble Herman Rubin in wanting full access to machine
instructions from high-level languages).  On some machines,
almost all arithmetic is integer.  Recently, while transporting 
my program to an Alliant, I vectorized some of these codes.  
A frequent primitive operation requires finding integers X, Y such that 
X*2**30 + Y = A*B + C where A, B, C are given (vectors of) integers
up to 2^30 (2^30 is the radix for multiple-precision arithmetic).
The Alliant has an vectorized integer multiply instruction, but
only for the lower 32 bits of a product.  Hence I can get 
Y = IAND(A*B + C, 2**30 - 1) with vectorized integer operations.
To get X (upper half of product) using vector operations, 
I use floating point, such as

	X = NINT((DBLE(A)*DBLE(B) + DBLE(C - Y))*0.5**30)

(DBLE = convert integer to double, NINT = convert double to nearest integer).
This statement uses 4 conversions between integer and floating point
while doing only 3 floating point operations.
The vectorized DBLE is compiled inline, but the vectorized NINT
is done in a subroutine; the NINT subroutine uses 10% of my program's time
(but total program time is down from before vectorization).

	BTW, I have asked the X3J3 committee to add this operation 
(i.e., given A, B, C, D, all nonnegative, and either A < D or B < D,
find X, Y such that X*D + Y = A*B + C) to Fortran 8x.


--------
        Peter Montgomery
        pmontgom@MATH.UCLA.EDU 
	Department of Mathematics, UCLA, Los Angeles, CA 90024