Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mstan!amull
From: amull@Morgan.COM (Andrew P. Mullhaupt)
Newsgroups: comp.arch
Subject: Re: F.P. vs. arbitrary-precision
Summary: Integer multiply is needed
Message-ID: <1660@s6.Morgan.COM>
Date: 9 Sep 90 14:45:27 GMT
References: <3755@osc.COM> <4513@taux01.nsc.com> <119244@linus.mitre.org> <6837.26e7ee92@vax1.tcd.ie>
Organization: Morgan Stanley & Co. NY, NY
Lines: 46

> There is practically no processing done which depends on integer * and / being
> fast (accessing an array of structures doesn't count because a smart compiler
> can use shifts and adds), and don't bother giving anecdotal cases because it's
> still less than 1% of the total. Therefore chip space was not wasted on making
> these fast.

I won't give examples but you're wrong about the address arithmetic.
Some machines (like i486, RS/6000) have integer multiplies and some
(SPARC) do not. Now the compilers of the world can get rid of integer
multiplies in address arithmetic _if_ they can figure out the sizes
of the arrays. This isn't trivial when the array may be a formal 
parameter (i.e. the size may not be fixed). A great deal of code uses
this kind of function argument. (Nearly all of numerical linear algebra,
a great deal of optimization...) I have been pretty annoyed at how slow
this stuff gets on the SPARC and pretty happy with how often my i486
home computer (running UNIX & DOS) beats the Sparcstations on identical
code. However, Fast is relative in this matter. The RS/6000 has integer
multiply but has an extremely fast set of floating point. This is really
good unless the array indexing gets in the way of the FP instructions,
which turns out to be a bit tricky. The thing probably needs to have
an even faster integer multiply (and this probably requires some sort of
superscalar capability) in order to get the best performance. 

The lack of an integer multiply on the SPARCs (I think some of them are
supposed to have it but the ones I use don't...) is pretty bad. But it is
compensated by the register windows which allow it to call functions
extremely quickly. I think that the real estate wasn't so much thought
to be wasted on integer multiply as better used for register windows.
A machine for scientific computing should probably not go so far as
to eliminate the integer multiply.

As for integer divide, less of a case can be made, and I can do without
it. Although integer quotient and remainder are probably computed
simultaneously, some languages (like C) force one to get those results
separately. If this wasn't the case, probably integer divides would
take up 25% less of our time.

Since this is comp.arch, I'll ask. How much chip would a 16x32 integer
multiply take? You could use this for a lot of address arithmetic
and build the full 32x32 multiply out of a few calls to this. In the
same vein, an 8x32 might even be a profitable use of chip. How does
this VLSI trade-off work out? 


Later,
Andrew Mullhaupt