Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!tut.cis.ohio-state.edu!rutgers!mephisto!ncar!ames!amdcad!nucleus!tim From: tim@nucleus.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide Summary: no jump required for signed int / power-of-2 Message-ID: <28568@amdcad.AMD.COM> Date: 31 Dec 89 21:28:37 GMT References: <84768@linus.UUCP> <4057@brazos.Rice.edu> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 65 In article <4057@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: | For most languages, it isn't safe to replace a divide by 2^N | with a shift right (doesn't work for negative numbers). | The same problem arises when replacing I % 4 with I & 3. | Division and remainder can be strength reduced, but require introduction | of a branch. Since C (specifically, the proposed ANSI Standard) specifies only that the expression (a/b)*b + a%b == a holds, the sign of a%b and the value for a/b when either operand is negative is implementation defined. Therefore, it is perfectly legal to define that a%b is always positive. If this method is used, then a/b (where b is a positive power-of-2) can be replaced with an arithmetic shift right. For example: Method 1: 5/2 == 2 5%2 == 1 -5/2 == -2 -5%2 == -1 5>>1 == 2 5&1 == 1 -5>>1 == -3 -5&1 == 1 ^ ^ | don't match | Method 2: 5/2 == 2 5%2 == 1 -5/2 == -3 -5%2 == 1 5>>1 == 2 5&1 == 1 -5>>1 == -3 -5&1 == 1 ^ ^ | match | IBM implemented this second form of / and % in their PL.8 compiler for the ROMP chip, just so the above optimization could be performed. However, division or modulo of a signed integer by a positive power-of-2 can be optimized without a branch. The technique is shown below: ; perform a/b, where the sign of a is unknown, ; and b is a positive power-of-2. b == 2**N ; tmp = (a<0) ? b-1 : 0; sra tmp, a, N-1 srl tmp, tmp, 32-N ; tmp = (tmp + a) >> N; add tmp, tmp, a sra tmp, tmp, N ; perform a%b, where the sign of a is unknown, ; and b is a positive power-of-2 ; tmp1 = (a<0) ? b-1 : 0; sra tmp1, a, N-1 srl tmp1, tmp1, 32-N ; tmp2 = ((tmp1 + a) & (b-1)) - tmp1; add tmp2, tmp1, a and tmp2, tmp2, b-1 sub tmp2, tmp2, tmp1 -- Tim Olson Advanced Micro Devices (tim@amd.com)