Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!cica!iuvax!mailrus!wasatch!cs.utexas.edu!oakhill!joeg From: joeg@oakhill.UUCP (Joe Gutierrez) Newsgroups: comp.arch Subject: Re: delayed branch Message-ID: <2288@homeboy.oakhill.UUCP> Date: 9 Aug 89 19:23:55 GMT References: <2246@taux01.UUCP> <1996@se-sd.NCR.COM> Reply-To: cs.utexas.edu!oakhill!jimk (Jim Klingshirn) Organization: Motorola Inc., Austin Tx. Lines: 43 >In article <2246@taux01.UUCP> cdddta@tasu76.UUCP (David Deitcher) writes: >>"Delayed branch" is a technique used by RISC machines to make use of the >>extra cycle needed to calculate branch targets. The compiler will put >>an instruction after the branch to be executed by the CPU while the >>branch target is being calculated. Does anyone have information as to >>how often the compiler is able to put a useful instruction after the >>branch as opposed to filling it with a NOP? In article <1996@se-sd.NCR.COM> lord@se-sd.NCR.COM (Dave Lord ) writes: >You mean in theory or in real life? :-) I've looked at code generated >by three different compilers for the 88K (GreenHills, GNU, & LPI) and >I don't believe any of them EVER put a useful instruction in the >delayed branch slot. Admittedly the 88K is still pretty new and these >were all early compilers. I suspect that the reason >the delayed branch slots are not used is that the register allocators >are not smart enough to hold a register after a branch. >Hopefully this will change. Anyone have >any idea what percentage of typical code is branches? It would be >interesting to know how much performance could be gained >by filling those slots. Current 88000 compilers are quite effective in scheduling instructions in the delay slot of branches if the compiler optimization flags are enabled. But if the optimizers are not used, the delayed branch instructions are not emitted. In addition to tons of other statistics, we have measured the frequency of branches and the branch delay slot utilization on several real applications including while running the C compiler, running a lisp interpreter, and running several Unix utilities. The applications were compiled with the Green Hills C compiler using the -OLM level of optimization. The measurements were made dynamically while running the applications on an instruction set simulator. The sample was comprised of over 400 million instructions executed. Of these, 13% were branches which were taken. Of all the branches taken, 70% executed an instruction in the delay slot. Interestingly enough, the compiler doesn't always try to schedule instructions in the delay slot of branches. If it has to choose between scheduling an instruction in the delay slot of a conditional branch or scheduling the same instruction in the delay slots of an unconditional operation such as a load, a multiply or a floating point operation, it will choose the unconditional operation. Jim Klingshirn Motorola Inc. (88000 Design Group)