Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!husc6!rutgers!lll-lcc!pyramid!prls!mips!mash
From: mash@mips.UUCP (John Mashey)
Newsgroups: comp.arch
Subject: Re: AM29000 Booleans [numbers; long]
Message-ID: <369@winchester.UUCP>
Date: Wed, 6-May-87 21:02:22 EDT
Article-I.D.: winchest.369
Posted: Wed May  6 21:02:22 1987
Date-Received: Sat, 9-May-87 01:55:58 EDT
References: <1270@aw.sei.cmu.edu> <16560@amdcad.AMD.COM>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 90

In article <16560@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes:
>In article <1270@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:

>|  The machine has comparison instructions that
>|  yield a Boolean result in a register.  The
>|  processor description says that TRUE is
>|  represented by a 1 in the MOST significant
>|  bit.  Is this a typo?
>No, it is not a typo.  One of the restrictions to the Boolean
>representation is that it had to be a single bit to allow quick
>determination of the target of a conditional jump.  Given this, it could
>either be placed in the most-significant bit (msb) or the
>least-significant bit (lsb).  The code generated for either of these
>representations is, in general, equivalent in terms of code space and
>cycles, except for two cases: the msb representation has the benefit of
>a quick "jump on negative", while the lsb representation "looks like a C
>Boolean", i.e.  it has the correct value when a Boolean is assigned to a
>variable. 
>
>For example, the code sequences that are generated for these cases are:
>			if (x < 0) .....
	(generated code)
>			x = (y < z);
	(generated code)
>Since the first code sequence is *much* more prevalent than the second in
>typical C code, it is better to "optimize" the first sequence at the
>expense of the second.

This is clearly correct [i.e., optimize for more frequent case],
although I suspect compiler writers may moan a little.
However, it does raise an interesting question:  was it not possible
with the 29K pipeline to offer the "other" fast branches, i.e., those that
do no arithmetic comparison, but that include the following set:
beqz
beq	(compares 2 regs)
bne	(compares 2 regs)
bnez
bgtz
blez
bgez*
bltz*

The *'d ones are the ones equivalent to the 29K's instructions.
Here is some data: over a set of 12 programs [as, ccom, compress, dhrystone,
espresso, hspice, nroff, terse, uopt, whetd, whets, timberwolf, i.e.,
mostly large real programs], we get the following data for dynamic frequencies,
as percentage of the instruction cycles [not cache/TLB miss]:

		mean	min	max
beqz/bnez	8.7%	3.7%	16.7%	A
bltz/bgez	0.83%	<0.1%	 4.6%	B	 [29K equiv]
beq,bne,bgtz,
blez		3.6%	1.3%	 8.4%	C

We have a set of compare instructions that let one materialize all of
the combinations.  The numbers for them are:

setlessthans	3.2%	1.1%	 8.0%	D

In general, in most cases, such instructions are shortly followed by
a beqz/bnez [I'll ignore the cases where one is just computing a 0 or 1].
Thus: E = % (beqz/bnez used WITHOUT compare) = A - D = 5.5%.

Let us grossly estimate the average instruction cycle hit [for us, as usual,
does not necesarily apply to anyone else] for several design choices.

Case 1: what increase in instruction cycles would we get if we didn't
have the fast branches [but instead, used compare + test condition code]:
= (A - D) + B + C = 5.5% + .8% + 3.6% = 9.9%, i.e., almost a 10% hit.

Case 2: how much would we get back if we added bltz/bgez back:
= B = .8%

As usual, there are all sorts of caveats about special cases, but
I think this is a reasonable estimate.

Bottom line:
a) Having the bltz/bgez (alone) is worth about .8%, which is at least
a modest win, and is probably worth the minor hassle of the shift to
get the 1-bit back.

b) Not having the other fast branches is about a 9% hit.  If the
cycle time is improved that much by not supporting them [unlikely,
but possible], then not having them is a win, else, it would have better
to do the full set, and then the 1-bit can go back in the other end
of the register.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086