Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!decwrl!deccrl!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!charon!dik
From: dik@cwi.nl (Dik T. Winter)
Newsgroups: comp.arch
Subject: Re: RISC integer multiply/divide (was Re: Snake)
Message-ID: <3275@charon.cwi.nl>
Date: 6 Apr 91 02:02:09 GMT
References: <1991Apr3.232757.8020@bingvaxu.cc.binghamton.edu> <9538@mentor.cc.purdue.edu> <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu>
Sender: news@cwi.nl
Organization: CWI, Amsterdam
Lines: 77

In article <1991Apr4.213550.8106@bingvaxu.cc.binghamton.edu> kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes:
 > In article <9538@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:
 > [discussion of double-prec multiply, esp 32x32->64 (yuk!)]
 > >This is true, but you would need either 17x17 -> 34 multiply, or 
 > >(16+sign)*(16+sign)->(32+sign), as well as a considerable use of
 > >keeping track of signs/carries, or substantially more additions.
 > 
 > Well I'm not sure I understand this -- it's probably a slightly
 > different implementation of the what I have in mind.
To understand it, see below.
 > 
 > One of the reasons for keeping the 3-multiply trick in mind is that some
 > chips include a dedicated fast multiplier. It's not unknown to link up this 
 > guy with the rest of the alu so that simultaneous add and mults can be done 
 > (after all, we spent a lot of money to get that multiplier IN there so let's 
 > use it for everything :-). Instructures like multiply-and-add, 
 > multiply-shift-and-add are therefore not unheard of. (I have in mind esp the 
 > TI 320 series, I presume those 6000 thingies can do similar things, but I 
 > haven't looked at the gory details).
TI 320, yes, RS6000, no.  As I mailed you, this came from a remark from me
about the m88k.  As on that machine with proper code a multiply takes only
a single cycle, there is no way to improve it with 3 multiplies only.  My
findings are that using an order n squared long integer multiply still pays
for fairly large numbers (hundreds of decimal digits on current architectures).
 > 
 > The appended programs show how to obtain single (i.e. 32x32->32) and
 > double (32x32->64) results for multiply using a 16x16->32 multiplier. The 
 > routines are pretty rough (written on the fly), so forgive any glaring (or 
 > subtle) errors.
Well, it is just those subtle errors that make a routine useless, and are
inherently unforgivable.  Glaring errors we can live with, they are easily
corrected!  (It is just like my talk with a representative from some,
unnamed, computer manifacturer about a design error I found in an instruction.
It made the instruction inherently useless, because in some, admittedly
obscure, cases the instruction would deliver the wrong result.  Considering
that the desing is already more than 10 years on the market one might wonder
how many programs silently failed due to this!)
 > 
 > >BTW, if unsigned multiplication is not available, additional problems
 > >arise in any case.
I dunno, anyhow:
 > I'm not sure I understand this either -- my routines use signed stuff. I can 
 > appreciate the problems regarding missing unsigned arithmetic, however.
Yup, but here we find the first error: is >> sign extending or not?  You did
not deal with this.  (And yes, there are machines that do not have an
arithmetic right shift.)  But I do not think that is essential.
 > 
 > ===single precision (32x32->32) multiply using (16x16->32) multiplier===
I will not comment on this; it is pretty straightforward.  And contains
similar errors as the next one.

 > ===double prec multiply (32x32->64) using (16x16->32) multiplier===
I presume you assume short is 16 bits, int is 32 bits and long is 64 bits?

some declarations here.... anyhow, LNum is long, Num is int, Snum is short.
I assume that all multiplies are signed 16*16->32 (as promised).

Some more details omitted, and next (hix,lox,hiy,loy are SNums, t1 is a LNum):
 > 	t1 += (((Num)-hix+lox)*((Num)hiy-loy))<<SHORTBITS;
Several problems here:  I do not know the current C standard well enough
but I would write '(Num)-hix' as '-(Num)hix' unless you are sure that promotion
to Num occurs before negation (in that case, why the case?).  I think it will
fail if hix=0x8000 and lox=1.  Also the promotions to Num of lox and loy are
not clear.  These are the glaring errors.  I will assume that you intended
that the expressions '-hix+lox' and 'hiy-loy' are still signed 16 bit numbers
(otherwise the assumption about the multiply would fail!).  Let's see:
Suppose, x = y = -2147483647; we find hix = -32768, lox = 1, hiy = -32768
and loy = 1.  Next, -hix+lox = 32769, but we have to truncate to 1 because
we are using this as an operand of the 16*16->32 multiply!  This is the
subtle error; it will fail on some well-choosen examples.
So do you now understand Herman Rubins remark that you need either 17*17->34
multiply or (16+sign)*(16+sign)->(32+sign)?

Anyhow, your implementation is worthless, even if we expunge the glaring errors.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl