Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!decwrl!nsc!taux01!orr From: orr@taux01.UUCP (Orr Michael ) Newsgroups: comp.arch Subject: Re: taken -vs- untaken branches Summary: much can, and is, being done, in SW & HW about branch prediction Keywords: compilers, branch predictors, delayed branches Message-ID: <436@taux01.UUCP> Date: 8 Jan 88 23:08:17 GMT References: <496@cresswell.quintus.UUCP> Reply-To: orr@taux01.UUCP (Orr Michael ) Organization: National Semiconductor (Israel) Ltd. Lines: 81 In article <496@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >I thought that taken branches were always more costly than untaken >branches, so that one should try to arrange branches so that the >not-taken case was the commonest. There were some messages in this >group a while back which seems to suggest that in some newer RISCs >this was no longer the case. > >Would someone who understands such matters care to explain? (Hope I qualify. I am part of the compiler writing team of NS. I saw some responses, but not a general answer to the question.) > The relative cost of branches is usualy as stated - taken ones require overhead. In many RISC architectures, the branch is delayed, usually by 1 cycle. in this space, some useful instruction can be executed, in parallel with the branching overhead. in this way the cost of the branch is reduced. this depends on having something useful to do which can be done at that time. (actually this is not limited to RISC. you can do the same in CISC as well). >It would also be interesting to see a comparison of > - branch taken cache (processor keeps a cache of instructions > and predicts that they will/won't be taken if they were/weren't > the last time) I know of several approaches to this : * branch destinations cache - the HW stores a small # of instrucctions from destinations of branches that were taken. in this way, the next time you get to the same branch you have lower overhead. * HW branch predictors, either static by branch direction, or a sort of a learning state-machine, which predicts branches acc. to their direction, history, location or whatever. * build a double fetch mechanism. on every branch, fetch some small number of instructions from both possible paths, so when you know which way the branch went you have a lower overhead. this is most useful in pipelined processors, were you pre-fetch instructions. > - compiler-assigned fixed branch prediction bits Possible, but difficult to implement. for every heuristic telling what is more likely I can bring an opposite example. > - guessing that backward branches will be taken and forward ones won't This is pretty much what the 32532 HW branch predictor does. this works very well. Again this just lowers the overhead, but does not guarantee a lower cost. a taken branch, even if correctly predicted, may still be more costly than a non-taken one. The cost depends on the architecture & Its' implementation. > >Background: we have some code where a lot of time is going into a few >fairly short loops > L: > b L ; usually taken >and it would be fairly easy to unroll them a couple of times: > L: > b X ; usually NOT taken > > b X ; usually NOT taken > > b L ; usually taken > X: >but if non-taken branches are more costly than taken ones this would >be a bad idea, so it would be good to have some idea of on which >machines it would be a good idea and on which it wouldn't. As stated before, this is what is assumed by the HW in the NS32532. Our compiler (CTP optimizing compiler suite) knows this, and tries to arrange things accordingly. (e.g. when changing a DIV x,2**n into a sequence containing a conditional branch, I generate different code, with a different branch direction for the NS32532 than I do for other NS32XXX chips). both of the above cases directly match the type of code the '532 HW branch predictor assumes it will see most of the time. Michael Orr. -- Michael Orr orr@nsc.NSC.COM IBM moto: "machines should work, pepole should think" Orr's comment : Nither do. Disclaimer : Opinions, come home. All is forgiven. Papa.