Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!ucbvax!agate!ig!uwmcsd1!bbn!husc6!mit-eddie!ll-xn!ames!claris!apple!bcase
From: bcase@apple.UUCP (Brian Case)
Newsgroups: comp.arch
Subject: Re: conditional branches
Message-ID: <7384@apple.UUCP>
Date: 12 Feb 88 19:40:53 GMT
References: <191@telesoft.UUCP> <1556@gumby.mips.COM>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Apple Computer Inc., Cupertino, USA
Lines: 43

In article <1556@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:
>The problem with this is that it makes a conditional a two instruction
>sequence.  With what we know about instruction frequencies, this is
>not a good idea.  The abstract concept of "compare two values and
>branch accordingly" represents around 15% of the instructions executed
>on a computer.  To make this take two instructions increases your
>instruction count by 15% (and thus, in many cases, your cycle count by
>15%).
>
>Packaging the abstract concept of conditional branching into one
>instruction rather than two is one of the numerous ways that some RISC
>machines go faster than other architectures.  What I've found quite
>surprizing is that some recent architectures (e.g. Berkeley RISC and
>its clone, SPARC), used condition codes after studying instruction
>stream statistics.  I suppose compare and branch in one instruction
>was considered too difficult, at the time.

The Am29000 provides at least one other variation:  conditional branch
instructions look at only the most significant bit of a register to decide
whether or not to branch.  Thus, this architecture requires two instructions
to accomplish compare/branch, but the result of the compare is a data value
like any other.  (Note that test for 32-bit twos-complement negative is
"free."  This comes in handy, very handy, for simulation of other
architectures!)  The computation of the boolean is subject to many
optimizations, not the least of which is scheduling.  The compare
instruction often ends up in a branch or load/store delay slot (sorry, I
don't have numbers for this).

As Earl points out, however, there is no question that a savings can be
had by providing compare/branch; the savings is mostly space, in my opinion.
I can tell you that there isn't time in the current Am29000 implementation
for a compare in the register read pipeline stage.  There might be time for
a zero-detection, but I doubt it.  As I understand it, the MIPS pipeline
does have a zero-detection in the register read stage, and I can believe
having it is a win.  For the Am29000, compare/branch would require either:
(1) a slower clock and unbalanced pipe, or (2) an extra delay slot or pipe
bubble for branches.  Neither of these options (I believe) is better than
a two-instruction compare and branch.  Different architectures require
different trade-offs to ensure "good" implementations.  The Am29000 register
file, just like the MIPS compare/branch, is a win.  Who wins more?  Oh please,
not that again!  :-) :-)

    bcase