Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mstan!amull From: amull@Morgan.COM (Andrew P. Mullhaupt) Newsgroups: comp.arch Subject: Re: F.P. vs. arbitrary-precision Message-ID: <1683@s6.Morgan.COM> Date: 11 Sep 90 15:37:10 GMT References: <3755@osc.COM> <4513@taux01.nsc.com> <119244@linus.mitre.org> <2534@l.cc.purdue.edu> Organization: Morgan Stanley & Co. NY, NY Lines: 53 In article <2534@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes: > > As for integer divide, less of a case can be made, and I can do without > > it. > > Maybe you do not see a need for it, but lots of people do; even floating > divide with integer quotient and floating remainder. This one you get with most IEEE coprocessors via one or two instructions. > > > Since this is comp.arch, I'll ask. How much chip would a 16x32 integer > > multiply take? You could use this for a lot of address arithmetic > > and build the full 32x32 multiply out of a few calls to this. In the > > same vein, an 8x32 might even be a profitable use of chip. How does > > this VLSI trade-off work out? > > The floating point unit already has at least most of the hardware required. > If the 53x53 floating point unit were slightly modified to give 64 bits of > output, almost no additional chip area would be required. Floating point > units still do their arithmetic as integer arithmetic, with front and back > ends. On machines with an integrated FPU (like the i486 and i860) this makes some sense. In fact _no_ modification is necessary to "widen" the FPU since it already computes 64 bit mantissae in most cases. Now some languages have provided support for this - notably Turbo Pascal on the PC-compatibles with its extended and (just for you, Herman) comp types. On machines with off-chip FPUs, (i.e. the coprocessors of the world) this idea isn't necessarily a winner since there can be a significant delay moving the data between chips. No, the problem here is for chips like the SPARC which would have to put the instruction on a chip where _no_ real estate is devoted to the FPU. Now the need for integer multiplies is real - although strength reduction is very useful, it is nearly impossible if the access to the array is not direct. In particular, pivoting (often necessary in numerical linear algebra) disorganizes the sequential nature of array accesses and prevents strength reduction. This type of access usually occurs in a subroutine where the array is a formal parameter, and so the indexing cannot be simplified at compile time in most languages. (FORTRAN 90 is somewhat of an exception since the shape of an array parameter is always known at run time and the compiler could take advantage of this. In C this situation is pretty grim if integer multiplications are slow...) N.B.: To all those who have "never seen code where the array shape is not known at compile time" I would ask that they stop sending me mail and take a look at almost any library of subroutines for numerical computation. This kind of code is very common. Later, Andrew Mullhaupt