Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mstan!amull
From: amull@Morgan.COM (Andrew P. Mullhaupt)
Newsgroups: comp.arch
Subject: Re: F.P. vs. arbitrary-precision
Message-ID: <1683@s6.Morgan.COM>
Date: 11 Sep 90 15:37:10 GMT
References: <3755@osc.COM> <4513@taux01.nsc.com> <119244@linus.mitre.org> <2534@l.cc.purdue.edu>
Organization: Morgan Stanley & Co. NY, NY
Lines: 53

In article <2534@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:
> In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes:
> > As for integer divide, less of a case can be made, and I can do without
> > it.
> 
> Maybe you do not see a need for it, but lots of people do; even floating
> divide with integer quotient and floating remainder.  

This one you get with most IEEE coprocessors via one or two instructions.

> 
> > Since this is comp.arch, I'll ask. How much chip would a 16x32 integer
> > multiply take? You could use this for a lot of address arithmetic
> > and build the full 32x32 multiply out of a few calls to this. In the
> > same vein, an 8x32 might even be a profitable use of chip. How does
> > this VLSI trade-off work out? 
> 
> The floating point unit already has at least most of the hardware required.
> If the 53x53 floating point unit were slightly modified to give 64 bits of
> output, almost no additional chip area would be required.  Floating point
> units still do their arithmetic as integer arithmetic, with front and back
> ends.

On machines with an integrated FPU (like the i486 and i860) this makes
some sense. In fact _no_ modification is necessary to "widen" the FPU
since it already computes 64 bit mantissae in most cases. Now some
languages have provided support for this - notably Turbo Pascal on the
PC-compatibles with its extended and (just for you, Herman) comp
types. On machines with off-chip FPUs, (i.e. the coprocessors of the
world) this idea isn't necessarily a winner since there can be a 
significant delay moving the data between chips. No, the problem here
is for chips like the SPARC which would have to put the instruction
on a chip where _no_ real estate is devoted to the FPU.


Now the need for integer multiplies is real - although strength
reduction is very useful, it is nearly impossible if the access
to the array is not direct. In particular, pivoting (often
necessary in numerical linear algebra) disorganizes the sequential
nature of array accesses and prevents strength reduction. This
type of access usually occurs in a subroutine where the array is
a formal parameter, and so the indexing cannot be simplified at
compile time in most languages. (FORTRAN 90 is somewhat of an
exception since the shape of an array parameter is always known
at run time and the compiler could take advantage of this. In C
this situation is pretty grim if integer multiplications are
slow...) N.B.: To all those who have "never seen code where the array
shape is not known at compile time" I would ask that they stop
sending me mail and take a look at almost any library of subroutines
for numerical computation. This kind of code is very common.

Later,
Andrew Mullhaupt