Xref: utzoo comp.arch:10086 sci.math:6915 rec.puzzles:3604 Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!amdcad!weitek!sci!kenm From: kenm@sci.UUCP (Ken McElvain) Newsgroups: comp.arch,sci.math,rec.puzzles Subject: Re: Divide by three? Summary: exact solution Message-ID: <50602@sci.UUCP> Date: 4 Jun 89 01:00:42 GMT References: Organization: Silicon Compilers Systems Corp. San Jose, Ca Lines: 65 In article , shs@uts.amdahl.com (Steve Schoettler) writes: > > Here's a puzzle: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > > There is not enough room in memory for a table lookup. > > Suppose (A): answer must be exact (to the nearest bit). > (B): answer can be approximate. > > Here are a couple of ideas for (B): > Case A) Exact division by 3 with small memory. let x = 4 5 4 3 2 num = a5*x + a4*x + a3*x + a2*x + a1*x + a0 0 <= ai < 4 0 <= a5 < 2 (because num is 11 bits) Dividing by 3 is equivalent to dividing by (x-1), which can be done by normal polynomial division. 4 3 2 1 result = b4*x + b3*x + b2*x + b1*x + b0 + lookup[bn1]; b4 = a5 b3 = a4 - b4 = a4 - a5 b2 = a3 - b3 = a3 - a4 + a5 b1 = a2 - b2 = a2 - a3 + a4 - a5 b0 = a1 - b1 = a1 - a2 + a3 - a4 + a5 bn1 = a0 - b0 = a0 - a1 + a2 - a3 + a4 - a5 Now -7 <= bn1 <= 9, so the lookup table will be small. Remainder can be calcuated in a much simplier way since 4 mod 3 = 1 which means that you can drop all the x**n terms. num mod 3 = (a5 + a4 + a3 + a2 + a1 + a0) mod 3 Repeat the additions until the result fits in two bits, then map 3 to 0. These operations are of some interest in vector machines since one would like to use all banks of memory regardless of the vector stride. The remainder mod 3 gives the bank number and the result of the division gives the offset in the bank. Then any stride which is not divisable by 3 will evenly hit the banks of memory. A prime number works well because it is unlikely to have a factor in common with the stride. Numbers which differ by 1 from a power of 2 are easier to perform the calcuations for, which leaves you with choices of 3 5 7 17 and 31 for number of banks. I believe this was used in the BSP (Burroughs Scientific Processor) with 17 way interleaving. Approximate methods can be used at the expense of not using some fraction of available memory. Whether or not it is worth it depends on how often strides divisable by a power of 2 actually happen. I don't know. Ken McElvain Silicon Compiler Systems decwrl!sci!kenm