Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!samsung!sol.ctr.columbia.edu!cica!iuvax!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew From: aglew@crhc.uiuc.edu (Andy Glew) Newsgroups: comp.arch Subject: Re: a style question Message-ID: Date: 8 Oct 90 01:31:57 GMT References: <7341@darkstar.ucsc.edu> <1990Sep30.050655.13212@zoo.toronto.edu> <1990Sep30.172917.2951@Neon.Stanford.EDU> <1990Oct2.151644.1581@phri.nyu.edu> <41876@mips.mips.COM> < Sender: news@ux1.cso.uiuc.edu (News) Followup-To: comp.arch Organization: Center for Reliable and High-Performance Computing University of Illinois at Urbana Champaign Lines: 66 In-Reply-To: aglew@crhc.uiuc.edu's message of 2 Oct 90 16:54:05 ..> Q: Are there machines where test for equality is faster than test for <, etc.? ..> A: yes. John Mashey explains, wrt. the MIPS R3000: > >1. branch on a == b 2 registers >2. branch on a != b "" >3. branch on a <= 0 register & 0 >4. branch on a > 0 "" >5. branch on a < 0 "" >6. branch on a >= 0 "" > >But not >7,8. branch on a < b, or a <= b > >... > > The timing was tight enough, for us, that a subtract as part of > a compare-and-branch would have lengthened the cycle time, > or would have added another branch delay cycle, losing > more than the compensation gained by having the extra instruction. Letting me harp on one of my favorite harping points: carry propagation is evil, and gets more evil as we move towards 64 bit machines. Besides carry, there is some small potential for making 1z. branch on a == 0 1 register 2z. branch on a != 0 "" as well as 3-6 above, faster than the class of 1 and 2, as well as faster than 7 and 8. 1z, 2z, and 3-6 can be precomputed, either at the time the value is created, or in a bit of slack time a little bit thereafter[*], and the result encoded in only a few bits - 6 if you want to be really slack, and only have to select a single bit at test time, fewer if you are willing to do a bit of encoding (more if you worry about signed/unsigned distinctions). These precomputed branch conditions can be stored as a few bits associated with the register. A logic function which is a function of only a few bits is usually faster than a function of many bits (as is required at branch time when comparing two variables for (in)equality). Moreover, these few bits can be associated with the branch logic, rather than with the rest of the register next to the ALU. Needless to say, you wouldn't do this unless you really had to. It can shave a bit off your cycle time for branches, but you don't really need to do that unless that is your critical time. As usual, simulate the tradeoffs. (Hint: scientific code does okay, pattern matching code tends to have tests for equality more often). [*] you wouldn't want to increase the latency of the results in order to do this precomputation. -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi] -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]