Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!decwrl!deccrl!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!charon!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.arch Subject: Re: RISC integer multiply/divide (was Re: Snake) Message-ID: <3275@charon.cwi.nl> Date: 6 Apr 91 02:02:09 GMT References: <1991Apr3.232757.8020@bingvaxu.cc.binghamton.edu> <9538@mentor.cc.purdue.edu> <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 77 In article <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: > In article <9538@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > [discussion of double-prec multiply, esp 32x32->64 (yuk!)] > >This is true, but you would need either 17x17 -> 34 multiply, or > >(16+sign)*(16+sign)->(32+sign), as well as a considerable use of > >keeping track of signs/carries, or substantially more additions. > > Well I'm not sure I understand this -- it's probably a slightly > different implementation of the what I have in mind. To understand it, see below. > > One of the reasons for keeping the 3-multiply trick in mind is that some > chips include a dedicated fast multiplier. It's not unknown to link up this > guy with the rest of the alu so that simultaneous add and mults can be done > (after all, we spent a lot of money to get that multiplier IN there so let's > use it for everything :-). Instructures like multiply-and-add, > multiply-shift-and-add are therefore not unheard of. (I have in mind esp the > TI 320 series, I presume those 6000 thingies can do similar things, but I > haven't looked at the gory details). TI 320, yes, RS6000, no. As I mailed you, this came from a remark from me about the m88k. As on that machine with proper code a multiply takes only a single cycle, there is no way to improve it with 3 multiplies only. My findings are that using an order n squared long integer multiply still pays for fairly large numbers (hundreds of decimal digits on current architectures). > > The appended programs show how to obtain single (i.e. 32x32->32) and > double (32x32->64) results for multiply using a 16x16->32 multiplier. The > routines are pretty rough (written on the fly), so forgive any glaring (or > subtle) errors. Well, it is just those subtle errors that make a routine useless, and are inherently unforgivable. Glaring errors we can live with, they are easily corrected! (It is just like my talk with a representative from some, unnamed, computer manifacturer about a design error I found in an instruction. It made the instruction inherently useless, because in some, admittedly obscure, cases the instruction would deliver the wrong result. Considering that the desing is already more than 10 years on the market one might wonder how many programs silently failed due to this!) > > >BTW, if unsigned multiplication is not available, additional problems > >arise in any case. I dunno, anyhow: > I'm not sure I understand this either -- my routines use signed stuff. I can > appreciate the problems regarding missing unsigned arithmetic, however. Yup, but here we find the first error: is >> sign extending or not? You did not deal with this. (And yes, there are machines that do not have an arithmetic right shift.) But I do not think that is essential. > > ===single precision (32x32->32) multiply using (16x16->32) multiplier=== I will not comment on this; it is pretty straightforward. And contains similar errors as the next one. > ===double prec multiply (32x32->64) using (16x16->32) multiplier=== I presume you assume short is 16 bits, int is 32 bits and long is 64 bits? some declarations here.... anyhow, LNum is long, Num is int, Snum is short. I assume that all multiplies are signed 16*16->32 (as promised). Some more details omitted, and next (hix,lox,hiy,loy are SNums, t1 is a LNum): > t1 += (((Num)-hix+lox)*((Num)hiy-loy))<32 multiply! This is the subtle error; it will fail on some well-choosen examples. So do you now understand Herman Rubins remark that you need either 17*17->34 multiply or (16+sign)*(16+sign)->(32+sign)? Anyhow, your implementation is worthless, even if we expunge the glaring errors. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl