Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!tut.cis.ohio-state.edu!rutgers!mephisto!ncar!ames!amdcad!nucleus!tim
From: tim@nucleus.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Integer Multiply/Divide
Summary: no jump required for signed int / power-of-2
Message-ID: <28568@amdcad.AMD.COM>
Date: 31 Dec 89 21:28:37 GMT
References: <84768@linus.UUCP> <4057@brazos.Rice.edu>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 65

In article <4057@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
| For most languages, it isn't safe to replace a divide by 2^N
| with a shift right (doesn't work for negative numbers).
| The same problem arises when replacing I % 4 with I & 3.
| Division and remainder can be strength reduced, but require introduction 
| of a branch.

Since C (specifically, the proposed ANSI Standard) specifies only that
the expression

	(a/b)*b + a%b == a

holds, the sign of a%b and the value for a/b when either operand is
negative is implementation defined.  Therefore, it is perfectly legal
to define that a%b is always positive.  If this method is used, then
a/b (where b is a positive power-of-2) can be replaced with an
arithmetic shift right.  For example:

Method 1:  5/2  == 2	 5%2 == 1	-5/2  == -2	-5%2 == -1
	   5>>1 == 2	 5&1 == 1	-5>>1 == -3	-5&1 ==  1
						  ^		 ^
						  |  don't match |


Method 2:  5/2  == 2	 5%2 == 1	-5/2  == -3	-5%2 ==  1
	   5>>1 == 2	 5&1 == 1	-5>>1 == -3	-5&1 ==  1
						  ^		 ^
						  |    match     |


IBM implemented this second form of / and % in their PL.8 compiler for
the ROMP chip, just so the above optimization could be performed.

However, division or modulo of a signed integer by a positive
power-of-2 can be optimized without a branch.  The technique is shown
below:

; perform a/b, where the sign of a is unknown,
; and b is a positive power-of-2.  b == 2**N

	; tmp = (a<0) ? b-1 : 0;
	sra	tmp, a, N-1	
	srl	tmp, tmp, 32-N

	; tmp = (tmp + a) >> N;
	add	tmp, tmp, a
	sra	tmp, tmp, N


; perform a%b, where the sign of a is unknown,
; and b is a positive power-of-2

	; tmp1 = (a<0) ? b-1 : 0;
	sra	tmp1, a, N-1
	srl	tmp1, tmp1, 32-N

	; tmp2 = ((tmp1 + a) & (b-1)) - tmp1;
	add	tmp2, tmp1, a
	and	tmp2, tmp2, b-1
	sub	tmp2, tmp2, tmp1


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)