Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mstan!amull From: amull@Morgan.COM (Andrew P. Mullhaupt) Newsgroups: comp.arch Subject: Re: F.P. vs. arbitrary-precision Summary: Integer multiply is needed Message-ID: <1660@s6.Morgan.COM> Date: 9 Sep 90 14:45:27 GMT References: <3755@osc.COM> <4513@taux01.nsc.com> <119244@linus.mitre.org> <6837.26e7ee92@vax1.tcd.ie> Organization: Morgan Stanley & Co. NY, NY Lines: 46 > There is practically no processing done which depends on integer * and / being > fast (accessing an array of structures doesn't count because a smart compiler > can use shifts and adds), and don't bother giving anecdotal cases because it's > still less than 1% of the total. Therefore chip space was not wasted on making > these fast. I won't give examples but you're wrong about the address arithmetic. Some machines (like i486, RS/6000) have integer multiplies and some (SPARC) do not. Now the compilers of the world can get rid of integer multiplies in address arithmetic _if_ they can figure out the sizes of the arrays. This isn't trivial when the array may be a formal parameter (i.e. the size may not be fixed). A great deal of code uses this kind of function argument. (Nearly all of numerical linear algebra, a great deal of optimization...) I have been pretty annoyed at how slow this stuff gets on the SPARC and pretty happy with how often my i486 home computer (running UNIX & DOS) beats the Sparcstations on identical code. However, Fast is relative in this matter. The RS/6000 has integer multiply but has an extremely fast set of floating point. This is really good unless the array indexing gets in the way of the FP instructions, which turns out to be a bit tricky. The thing probably needs to have an even faster integer multiply (and this probably requires some sort of superscalar capability) in order to get the best performance. The lack of an integer multiply on the SPARCs (I think some of them are supposed to have it but the ones I use don't...) is pretty bad. But it is compensated by the register windows which allow it to call functions extremely quickly. I think that the real estate wasn't so much thought to be wasted on integer multiply as better used for register windows. A machine for scientific computing should probably not go so far as to eliminate the integer multiply. As for integer divide, less of a case can be made, and I can do without it. Although integer quotient and remainder are probably computed simultaneously, some languages (like C) force one to get those results separately. If this wasn't the case, probably integer divides would take up 25% less of our time. Since this is comp.arch, I'll ask. How much chip would a 16x32 integer multiply take? You could use this for a lot of address arithmetic and build the full 32x32 multiply out of a few calls to this. In the same vein, an 8x32 might even be a profitable use of chip. How does this VLSI trade-off work out? Later, Andrew Mullhaupt