Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!cica!ctrsol!uakari.primate.wisc.edu!csd4.milw.wisc.edu!bionet!ames!amdcad!cayman!tim From: tim@cayman.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: delayed branch Message-ID: <26667@amdcad.AMD.COM> Date: 8 Aug 89 21:33:54 GMT References: <828@eutrc3.urc.tue.nl> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices, Austin, TX Lines: 37 Summary: Expires: Sender: Followup-To: In article <828@eutrc3.urc.tue.nl> rcpieter@rc4.urc.tue.nl writes: | Just wondering--- | - What happens on existing processors which use delayed branches when the | instruction put in the branch instruction's shadow is also a branch? You get what we term a "visit". This executes the single instruction at the first branch target, then continues with the second (delay-slot) branch's target. The second branch does not have a physical delay slot following it; rather, the instruction at the first branch's target acts as the second branch's delay slot. This does have its uses -- for example, a debugger can replace an instruction with a specific breakpoint instruction, saving the replaced instruction elsewhere. When it comes time to restart execution, the debugger can execute the replaced instruction "out-of-line" with a visit, allowing the breakpoint instruction to physically remain at the breakpoint location. | - Do existing processors have a seperate adder for use by branches only, or | are there restrictions on the possible instructions which can be put in the | shadow? | - Are there any processors which use branch delaying and don't have a | seperate address calculation adder? I know of no processors that have delayed branches and restrictions on delay instruction types. However, there really isn't any resource contention between a branch and its delay slot. The delay slot is just being decoded by the time the branch executes. The reason that a processor may have a separate adder to calculate PC-relative branch offset addresses is that they must perform an instruction cache lookup during the branch execution, and thus must have the address calculated by the end of the decode stage (at least, this is the way it works on the Am29000). -- Tim Olson Advanced Micro Devices (tim@amd.com)