Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!ames!pacbell!osc!jgk From: jgk@osc.COM (Joe Keane) Newsgroups: comp.arch Subject: Re: Integer multiply and killer micros Summary: Lookup tables are your friend, but hardware is faster. Keywords: lookup tables Message-ID: <1829@osc.COM> Date: 9 Jan 90 20:37:56 GMT References: <158@csinc.UUCP> <787@stat.fsu.edu> <42701@lll-winken.LLNL.GOV> <5842@ncar.ucar.edu> <490@qusunl.queensu.CA> Reply-To: jgk@osc.COM (Joe Keane) Organization: Object Sciences Corp., Menlo Park, CA Lines: 23 In article Daniel.Stodolsky@cs.cmu.edu writes: >If my memory serves me correctly, one should be able to compute a 32 x >32 -> 64 bit multiply with four 16x16 -> 32 multiplies, 4 32 bit adds >and a few shits. So why not put some of big memory killer micros to >work and have a 16 by 16 multiply lookup table? It would consume 10 megs >of core, but that's nothing for a KILLER MICRO. Assume memory access >with a cache miss is around 3 cycles and one can schedule to avoid >register interlocking (as in HP-PA), it seems possible to do 32x32 -> 64 >( results in registers) in about 20 cycles. I'm a big supporter of lookup tables, and i've dealt with some pretty strange ones. One is a 12trits -> 16bits table to evaluate Othello positions. Or a table of rint(log(1+exp(x*LB))/LB) to do logarithm-representation addition. There's an interesting way to do multiplies which doesn't involve such a big table. You can use a table of squares and more adds and subtracts. It gets a little hairy to do 32x32 -> 64 in 16-bit chunks, but the table only takes up 512KB or so. I don't know how you got 10MB for your table. Of course, since everyone uses this table, you can put it in ROM. But wait, you can use less gates with a PLA, or just random logic. And once you've done that, you might as well throw in the shifter and adder. What the heck, buy a DSP, that's what it's for. :-)