Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!linus!necntc!necis!encore!fay From: fay@encore.UUCP Newsgroups: comp.arch Subject: Re: Divides (Was RE: What should be in hardware but isn't) Message-ID: <1990@encore.UUCP> Date: Thu, 1-Oct-87 10:47:54 EDT Article-I.D.: encore.1990 Posted: Thu Oct 1 10:47:54 1987 Date-Received: Sat, 3-Oct-87 06:49:04 EDT References: <581@l.cc.purdue.edu> <18336@amdcad.AMD.COM> Reply-To: fay@encore.UUCP (Peter Fay) Organization: Encore Computer Corp, Marlboro, MA Lines: 49 In article <7481@steinmetz.steinmetz.UUCP> oconnor@sunray.UUCP (Dennis Oconnor) writes: *In article <6370@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: **I said: Its VERY difficult to make fixed point division run faster than **a bit per cycle, without a LOT of hardware. By leaving out the **special purpose speedup stuff, you can afford to include some VERY **useful general purpose speedup stuff: More registers ... branch folding ... ** *** This is not really true. If you have a fast multiplier ( which is *** a good idea for many applications ) you can do division very much *** quicker than one cycle per bit, relatively easily, especially for *** long word lengths. In fact, you can do division in something like *** *** C + (multiply_lateny * (Int_Round_up( log_base2( word_length )) - K) *** *** where C and K are positive integer small constants dependant on *** how you implement your algorithm. The technique to use is *** Newton-Raphson iteration with a first-guess look-up table. *** "The official divide algorithm of the IBM-360/95 and Cray-1 (I think :-)" *** The additional hardware needed (besides a fast multiplier) is TWIT. If the Newton-Raphson is not expensive in hardware, and is faster than one cycle per bit (~O(log2)), why not always design it into CPUs? (That is, into CPUs with wide words.) *It has the look-up table in random logic ( only need a few bits ). *Has a clever normalize-denominator instruction (needed). *Does the iteration in assembly language (it's a RISC machine) *so no additional datapath is required. And will be in silicon RSN. * Hey, wait a minute! Are you saying you do almost the whole operation in assembler? The only extras you need are normalize-denominator instruction (neat idea) and the look-up table for the initial approximation of the divisor reciprocal (do you mean RAM or ROM when you say "random logic"?)???? Wow!!! Whatever happened to doing things the hard way? (:-) $64 Question: Since NR-Iteration doesn't give accurate remainder, how do you produce one? Or do you skip remainders altogether? *... Details will have *to wait for the official public release... * When will this be? -- peter fay fay@multimax.arpa {allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay