Newsgroups: comp.arch Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!sarah!bingnews!kym From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) Subject: Re: RISC integer multiply/divide (was Re: Snake) Message-ID: <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> Organization: State University of New York at Binghamton References: <3237@charon.cwi.nl> <1991Apr3.232757.8020@bingvaxu.cc.binghamton.edu> <9538@mentor.cc.purdue.edu> Date: Thu, 4 Apr 1991 21:35:50 GMT In article <9538@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: [discussion of double-prec multiply, esp 32x32->64 (yuk!)] >This is true, but you would need either 17x17 -> 34 multiply, or >(16+sign)*(16+sign)->(32+sign), as well as a considerable use of >keeping track of signs/carries, or substantially more additions. Well I'm not sure I understand this -- it's probably a slightly different implementation of the what I have in mind. One of the reasons for keeping the 3-multiply trick in mind is that some chips include a dedicated fast multiplier. It's not unknown to link up this guy with the rest of the alu so that simultaneous add and mults can be done (after all, we spent a lot of money to get that multiplier IN there so let's use it for everything :-). Instructures like multiply-and-add, multiply-shift-and-add are therefore not unheard of. (I have in mind esp the TI 320 series, I presume those 6000 thingies can do similar things, but I haven't looked at the gory details). So for such machines we can both make use of the trick and try to use as many fast instruction combinations as possible; the trouble _may_ be worth it in some applications (although I haven't actually measured anything). The appended programs show how to obtain single (i.e. 32x32->32) and double (32x32->64) results for multiply using a 16x16->32 multiplier. The routines are pretty rough (written on the fly), so forgive any glaring (or subtle) errors. >BTW, if unsigned multiplication is not available, additional problems >arise in any case. I'm not sure I understand this either -- my routines use signed stuff. I can appreciate the problems regarding missing unsigned arithmetic, however. -kym C code: ===single precision (32x32->32) multiply using (16x16->32) multiplier=== /*** Taking: (aL+b)(cL+d) = LLac+Lad+Lbc+bd L(a-b)(c-d) = Lac-Lad-Lbc+Lbd where L=2^k we find: L(a-b)(c-d) + (aL+b)(cL+d) = LLac+Lac+Lbd+bd Thereore: mul(x,y) = hix hiy <<2k + hix hiy<>SHORTBITS; lox=x&MASK(SHORTBITS); hiy=y>>SHORTBITS; loy=y&MASK(SHORTBITS); t2 = t1 = (Num)lox*loy; t2 += (Num)hix*hiy; /* multiply-and-add */ t2 += ((Num)-hix+lox)*((Num)hiy-loy); /* multiply-and-add */ t1 += t2<64) using (16x16->32) multiplier=== #define SHORTBITS 16 typedef long LNum; typedef int Num; typedef short SNum; #define BIT(N) (1L<<(N)) #define MASK(N) (BIT(N)-1) LNum mul(x,y) Num x,y; { SNum hix,hiy,lox,loy; LNum t1; hix=x>>SHORTBITS; lox=x&MASK(SHORTBITS); hiy=y>>SHORTBITS; loy=y&MASK(SHORTBITS); t1 = (Num)lox*loy; t1 += ((Num)hix*hiy)<