Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!ucbvax!hplabs!pyramid!prls!mips!earl From: earl@mips.COM (Earl Killian) Newsgroups: comp.arch Subject: Re: conditional branches Message-ID: <1610@gumby.mips.COM> Date: 18 Feb 88 02:00:13 GMT References: <191@telesoft.UUCP> <1556@gumby.mips.COM> <375@imagine.PAWL.RPI.EDU> Lines: 35 In-reply-to: jesup@pawl1.pawl.rpi.edu's message of 16 Feb 88 08:39:17 GMT In article <375@imagine.PAWL.RPI.EDU> jesup@pawl1.pawl.rpi.edu (Randell E. Jesup) writes: Think about compare & branch from a hardware point of view. To do it in one cycle, you must fetch two values, run them through the ALU, and get the result. Now you have the information that allows you to determine whether to branch. You must also determine the branch destination. This may also require some computation, an addition to the PC (though it might be speeded a little by knowing the offset if some small number of bits, which it has to be given a 32-bit instruction.) If you're willing to build another fast adder for this computation and run it in parallel, you MIGHT be able to pull it off, though I doubt it. It would cost LOTS of chip area, and would probably be your critical path that determines your cycle time (certainly it would be if you didn't have a parallel adder!) Hardware makes things go faster. That's why RISC machines tend to have more hardware in them than CISCs (they find room the extra hardware by tossing out the firmware, for a net savings). It is perfectly reasonable to dedicate an adder to computing PC+branchdisplacement on every instruction (not just branch instructions), and selecting between that and PC+1 based on the branch decision. Perfectly reasonable because that one adder just added 10% to your performance. Branch decisions can have practically the same timing constraints as load/store instructions in a simple pipeline; if you can do the address add for the load/stores, then you can do the branch decision. The details depend on your pipeline. The MIPS R2000 pipeline is not quite as generous to branch decisions as a simple pipeline because it has virtual to physical translation in series with cache access, which is why it leaves out the X < Y compare and branch. It does do X = Y, and X ? 0, which are most of the compare and branches. The end result is that the MIPS architecture is about 10% more efficient than condition-code architectures from branches alone (i.e. needs to execute 10% fewer instructions).