Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!samsung!sol.ctr.columbia.edu!cica!iuvax!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew
From: aglew@crhc.uiuc.edu (Andy Glew)
Newsgroups: comp.arch
Subject: Re: a style question
Message-ID: <AGLEW.90Oct7203157@dwarfs.crhc.uiuc.edu>
Date: 8 Oct 90 01:31:57 GMT
References: <7341@darkstar.ucsc.edu> <1990Sep30.050655.13212@zoo.toronto.edu>
	<1990Sep30.172917.2951@Neon.Stanford.EDU>
	<DAVIS.90Oct1025438@pacific.mps.ohio-state.edu>
	<1990Oct2.151644.1581@phri.nyu.edu> <MEISSNER.90Oct2140411@osf.osf.org>
	<41876@mips.mips.COM> <
Sender: news@ux1.cso.uiuc.edu (News)
Followup-To: comp.arch
Organization: Center for Reliable and High-Performance Computing University of
	Illinois at Urbana Champaign
Lines: 66
In-Reply-To: aglew@crhc.uiuc.edu's message of 2 Oct 90 16:54:05

..> Q: Are there machines where test for equality is faster than test for <, etc.?
..> A: yes.  John Mashey explains, wrt. the MIPS R3000:
>
>1.	branch on a == b	2 registers
>2.	branch on a != b	""
>3.	branch on a <= 0	register & 0
>4.	branch on a > 0		""
>5.	branch on a < 0		""
>6.	branch on a >= 0	""
>
>But not
>7,8.	branch on a < b, or a <= b
>
>...
>	
>	The timing was tight enough, for us, that a subtract as part of
>	a compare-and-branch would have lengthened the cycle time,
>	or would have added another branch delay cycle, losing
>	more than the compensation gained by having the extra instruction.


Letting me harp on one of my favorite harping points: carry
propagation is evil, and gets more evil as we move towards 64 bit
machines.


Besides carry, there is some small potential for making 

1z.	branch on a == 0	1 register
2z.	branch on a != 0	""

as well as 3-6 above, faster than the class of 1 and 2,
as well as faster than 7 and 8.

1z, 2z, and 3-6 can be precomputed, either at the time the value is
created, or in a bit of slack time a little bit thereafter[*], and the
result encoded in only a few bits - 6 if you want to be really slack,
and only have to select a single bit at test time, fewer if you are
willing to do a bit of encoding (more if you worry about
signed/unsigned distinctions).

These precomputed branch conditions can be stored as a few bits
associated with the register.  A logic function which is a function of
only a few bits is usually faster than a function of many bits (as is
required at branch time when comparing two variables for
(in)equality).
    Moreover, these few bits can be associated with the branch logic, 
rather than with the rest of the register next to the ALU.

Needless to say, you wouldn't do this unless you really had to.  It
can shave a bit off your cycle time for branches, but you don't really
need to do that unless that is your critical time.

As usual, simulate the tradeoffs. (Hint: scientific code does okay,
pattern matching code tends to have tests for equality more often).


[*] you wouldn't want to increase the latency of the results 
    in order to do this precomputation.
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]