Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!husc6!rutgers!lll-lcc!pyramid!prls!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: comp.arch Subject: Re: AM29000 Booleans [numbers; long] Message-ID: <369@winchester.UUCP> Date: Wed, 6-May-87 21:02:22 EDT Article-I.D.: winchest.369 Posted: Wed May 6 21:02:22 1987 Date-Received: Sat, 9-May-87 01:55:58 EDT References: <1270@aw.sei.cmu.edu> <16560@amdcad.AMD.COM> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 90 In article <16560@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes: >In article <1270@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >| The machine has comparison instructions that >| yield a Boolean result in a register. The >| processor description says that TRUE is >| represented by a 1 in the MOST significant >| bit. Is this a typo? >No, it is not a typo. One of the restrictions to the Boolean >representation is that it had to be a single bit to allow quick >determination of the target of a conditional jump. Given this, it could >either be placed in the most-significant bit (msb) or the >least-significant bit (lsb). The code generated for either of these >representations is, in general, equivalent in terms of code space and >cycles, except for two cases: the msb representation has the benefit of >a quick "jump on negative", while the lsb representation "looks like a C >Boolean", i.e. it has the correct value when a Boolean is assigned to a >variable. > >For example, the code sequences that are generated for these cases are: > if (x < 0) ..... (generated code) > x = (y < z); (generated code) >Since the first code sequence is *much* more prevalent than the second in >typical C code, it is better to "optimize" the first sequence at the >expense of the second. This is clearly correct [i.e., optimize for more frequent case], although I suspect compiler writers may moan a little. However, it does raise an interesting question: was it not possible with the 29K pipeline to offer the "other" fast branches, i.e., those that do no arithmetic comparison, but that include the following set: beqz beq (compares 2 regs) bne (compares 2 regs) bnez bgtz blez bgez* bltz* The *'d ones are the ones equivalent to the 29K's instructions. Here is some data: over a set of 12 programs [as, ccom, compress, dhrystone, espresso, hspice, nroff, terse, uopt, whetd, whets, timberwolf, i.e., mostly large real programs], we get the following data for dynamic frequencies, as percentage of the instruction cycles [not cache/TLB miss]: mean min max beqz/bnez 8.7% 3.7% 16.7% A bltz/bgez 0.83% <0.1% 4.6% B [29K equiv] beq,bne,bgtz, blez 3.6% 1.3% 8.4% C We have a set of compare instructions that let one materialize all of the combinations. The numbers for them are: setlessthans 3.2% 1.1% 8.0% D In general, in most cases, such instructions are shortly followed by a beqz/bnez [I'll ignore the cases where one is just computing a 0 or 1]. Thus: E = % (beqz/bnez used WITHOUT compare) = A - D = 5.5%. Let us grossly estimate the average instruction cycle hit [for us, as usual, does not necesarily apply to anyone else] for several design choices. Case 1: what increase in instruction cycles would we get if we didn't have the fast branches [but instead, used compare + test condition code]: = (A - D) + B + C = 5.5% + .8% + 3.6% = 9.9%, i.e., almost a 10% hit. Case 2: how much would we get back if we added bltz/bgez back: = B = .8% As usual, there are all sorts of caveats about special cases, but I think this is a reasonable estimate. Bottom line: a) Having the bltz/bgez (alone) is worth about .8%, which is at least a modest win, and is probably worth the minor hassle of the shift to get the 1-bit back. b) Not having the other fast branches is about a 9% hit. If the cycle time is improved that much by not supporting them [unlikely, but possible], then not having them is a win, else, it would have better to do the full set, and then the 1-bit can go back in the other end of the register. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086