Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!ucbvax!agate!ig!uwmcsd1!bbn!husc6!mit-eddie!ll-xn!ames!claris!apple!bcase From: bcase@apple.UUCP (Brian Case) Newsgroups: comp.arch Subject: Re: conditional branches Message-ID: <7384@apple.UUCP> Date: 12 Feb 88 19:40:53 GMT References: <191@telesoft.UUCP> <1556@gumby.mips.COM> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Apple Computer Inc., Cupertino, USA Lines: 43 In article <1556@gumby.mips.COM> earl@mips.COM (Earl Killian) writes: >The problem with this is that it makes a conditional a two instruction >sequence. With what we know about instruction frequencies, this is >not a good idea. The abstract concept of "compare two values and >branch accordingly" represents around 15% of the instructions executed >on a computer. To make this take two instructions increases your >instruction count by 15% (and thus, in many cases, your cycle count by >15%). > >Packaging the abstract concept of conditional branching into one >instruction rather than two is one of the numerous ways that some RISC >machines go faster than other architectures. What I've found quite >surprizing is that some recent architectures (e.g. Berkeley RISC and >its clone, SPARC), used condition codes after studying instruction >stream statistics. I suppose compare and branch in one instruction >was considered too difficult, at the time. The Am29000 provides at least one other variation: conditional branch instructions look at only the most significant bit of a register to decide whether or not to branch. Thus, this architecture requires two instructions to accomplish compare/branch, but the result of the compare is a data value like any other. (Note that test for 32-bit twos-complement negative is "free." This comes in handy, very handy, for simulation of other architectures!) The computation of the boolean is subject to many optimizations, not the least of which is scheduling. The compare instruction often ends up in a branch or load/store delay slot (sorry, I don't have numbers for this). As Earl points out, however, there is no question that a savings can be had by providing compare/branch; the savings is mostly space, in my opinion. I can tell you that there isn't time in the current Am29000 implementation for a compare in the register read pipeline stage. There might be time for a zero-detection, but I doubt it. As I understand it, the MIPS pipeline does have a zero-detection in the register read stage, and I can believe having it is a win. For the Am29000, compare/branch would require either: (1) a slower clock and unbalanced pipe, or (2) an extra delay slot or pipe bubble for branches. Neither of these options (I believe) is better than a two-instruction compare and branch. Different architectures require different trade-offs to ensure "good" implementations. The Am29000 register file, just like the MIPS compare/branch, is a win. Who wins more? Oh please, not that again! :-) :-) bcase