Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!ucbvax!hplabs!pyramid!prls!mips!earl
From: earl@mips.COM (Earl Killian)
Newsgroups: comp.arch
Subject: Re: conditional branches
Message-ID: <1610@gumby.mips.COM>
Date: 18 Feb 88 02:00:13 GMT
References: <191@telesoft.UUCP> <1556@gumby.mips.COM> <375@imagine.PAWL.RPI.EDU>
Lines: 35
In-reply-to: jesup@pawl1.pawl.rpi.edu's message of 16 Feb 88 08:39:17 GMT

In article <375@imagine.PAWL.RPI.EDU> jesup@pawl1.pawl.rpi.edu (Randell E. Jesup) writes:

   Think about compare & branch from a hardware point of view.  To do
   it in one cycle, you must fetch two values, run them through the
   ALU, and get the result.  Now you have the information that allows
   you to determine whether to branch.  You must also determine the
   branch destination.  This may also require some computation, an
   addition to the PC (though it might be speeded a little by knowing
   the offset if some small number of bits, which it has to be given a
   32-bit instruction.)  If you're willing to build another fast adder
   for this computation and run it in parallel, you MIGHT be able to
   pull it off, though I doubt it.  It would cost LOTS of chip area,
   and would probably be your critical path that determines your cycle
   time (certainly it would be if you didn't have a parallel adder!)

Hardware makes things go faster.  That's why RISC machines tend to
have more hardware in them than CISCs (they find room the extra
hardware by tossing out the firmware, for a net savings).  It is
perfectly reasonable to dedicate an adder to computing
PC+branchdisplacement on every instruction (not just branch
instructions), and selecting between that and PC+1 based on the branch
decision.  Perfectly reasonable because that one adder just added 10%
to your performance.

Branch decisions can have practically the same timing constraints as
load/store instructions in a simple pipeline; if you can do the
address add for the load/stores, then you can do the branch decision.
The details depend on your pipeline.  The MIPS R2000 pipeline is not
quite as generous to branch decisions as a simple pipeline because it
has virtual to physical translation in series with cache access, which
is why it leaves out the X < Y compare and branch.  It does do X = Y,
and X ? 0, which are most of the compare and branches.  The end result
is that the MIPS architecture is about 10% more efficient than
condition-code architectures from branches alone (i.e. needs to
execute 10% fewer instructions).