Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!brutus.cs.uiuc.edu!jarthur!bridge2!mips!hal!mark From: mark@mips.COM (Mark G. Johnson) Newsgroups: comp.arch Subject: Re: Integer multiply and killer micros Message-ID: <34259@mips.mips.COM> Date: 9 Jan 90 00:50:16 GMT References: <158@csinc.UUCP> <787@stat.fsu.edu> <42701@lll-winken.LLNL.GOV> <5842@ncar.ucar.edu> <490@qusunl.queensu.CA> Sender: news@mips.COM Reply-To: mark@mips.COM (Mark G. Johnson) Organization: MIPS Computer Systems, Inc. Lines: 30 In article Daniel.Stodolsky@cs.cmu.edu writes: > > >So why not put some of big memory killer micros to >work and have a 16 by 16 multiply lookup table? It would consume 10 megs >of core, but that's nothing for a KILLER MICRO. Assume memory access >with a cache miss is around 3 cycles and one can schedule to avoid >register interlocking (as in HP-PA), it seems possible to do 32x32 -> 64 >( results in registers) in about 20 cycles. > The idea above proposes to use 80 million bits of RAM and 20 clock cycles to compute a 32b integer multiply. This in noncompetitive when compared to killer micros, which multiply more quickly and consume far less real estate. Instead of lookup tables they implement dedicated hardware: R6000: 16 cycles 32x32 -> 64 R3000: 12 cycles 32x32 -> 64 M88000: 4 cycles 32x32 -> 32 ** **88k computes the 32 lsb's of the 64b product (upper bits are discarded). If, as rumored, a new imul instruction is added to SPARC, you can bet its implementation in hardware will be just as fast as, or perhaps faster than, those above. This is what Doctor Ross is good at. And of course the T.I. folks have lots of DSP experience with blazing fast hw multipliers. -- -- Mark Johnson MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 (408) 991-0208 mark@mips.com {or ...!decwrl!mips!mark}