Newsgroups: comp.arch Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!sarah!bingnews!kym From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) Subject: Re: RISC integer multiply/divide (was Re: Snake) Message-ID: <1991Apr6.170236.2628@bingvaxu.cc.binghamton.edu> Organization: State University of New York at Binghamton References: <9538@mentor.cc.purdue.edu> <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> <3275@charon.cwi.nl> Date: Sat, 6 Apr 1991 17:02:36 GMT In article <3275@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >In article <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: [I didn't understand the need for 17x17->34 arithmetic do do the 3 multiply trick] >To understand it, see below. Well I'm still not convinced. In the followup article I _think_ I have a working 32x32->64 multiply (not that I've checked it exhaustively mind you). Both the naive and 3-multiply versions seem to get the same answers over a large number of cases (and I did some independent checking of the ``end conditions'' of all simple patterns of n-bits rotated in a 32-bit field and those pretty pattern suggested by Knuth that are easy to eyeball). I don't see anywhere that uses 33x33->66 bit arithmetic although a bit of sign-testing goes on before we do the product-of-differences stuff. Forgive me if I'm wrong, but to convert the unsigned routines to signed ones just requires a couple of further 32-bit adds; still no funny multiplies. [I talk about multiply-and-add and shift-and-add perhaps having an impact on 3-m vs 4-m; esp on DSP chips such as TI 320, _maybe_ on 6000] >TI 320, yes, RS6000, no. As I mailed you, this came from a remark from me >about the m88k. As on that machine with proper code a multiply takes only >a single cycle, there is no way to improve it with 3 multiplies only. Well I don't know what you _think_ you said in your email. But all I have on file is ``all those halfword adds are not worth the pain''; I don't see anything there about m88k's (although that may have been implicit, it doesn't read that way). >My findings are that using an order n squared long integer multiply still pays >for fairly large numbers (hundreds of decimal digits on current architectures). Well on any machine where multiplies < about 6.5 average instructions I would agree with you (see my previous post for derivation). In the context of other architectures, which are abundant but not common :-) maybe this is not the case. As all on comp.arch know by now, I'm not one for taking people's word for anything when I can measure it for myself. [long analysis of poor programs] >Anyhow, your implementation is worthless, even if we expunge the glaring errors. Sorry, but I thought it was apparent they were meant to be illustrative, not implementations. It's pretty hard to make C do a real 16x16->32 multiply in the first place :-) As illustrations I guess they leave something to be desired due to the various ambiguities you outlined, but I hope some of this was cleared up by the follow-up post of what (I hope) are working versions. -kym