Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!cica!ctrsol!uakari.primate.wisc.edu!csd4.milw.wisc.edu!bionet!ames!amdcad!cayman!tim
From: tim@cayman.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: delayed branch
Message-ID: <26667@amdcad.AMD.COM>
Date: 8 Aug 89 21:33:54 GMT
References: <828@eutrc3.urc.tue.nl>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Organization: Advanced Micro Devices, Austin, TX
Lines: 37
Summary:
Expires:
Sender:
Followup-To:

In article <828@eutrc3.urc.tue.nl> rcpieter@rc4.urc.tue.nl writes:
| Just wondering---
|  - What happens on existing processors which use delayed branches when the
| instruction put in the branch instruction's shadow is also a branch?

You get what we term a "visit".  This executes the single instruction
at the first branch target, then continues with the second (delay-slot)
branch's target.  The second branch does not have a physical delay slot
following it; rather, the instruction at the first branch's target acts
as the second branch's delay slot.

This does have its uses -- for example, a debugger can replace an
instruction with a specific breakpoint instruction, saving the replaced
instruction elsewhere.  When it comes time to restart execution, the
debugger can execute the replaced instruction "out-of-line" with a
visit, allowing the breakpoint instruction to physically remain at the
breakpoint location.

|  - Do existing processors have a seperate adder for use by branches only, or
| are there restrictions on the possible instructions which can be put in the
| shadow?
|  - Are there any processors which use branch delaying and don't have a
| seperate address calculation adder?

I know of no processors that have delayed branches and restrictions on
delay instruction types.  However, there really isn't any resource
contention between a branch and its delay slot.  The delay slot is just
being decoded by the time the branch executes.  The reason that a
processor may have a separate adder to calculate PC-relative branch
offset addresses is that they must perform an instruction cache lookup
during the branch execution, and thus must have the address calculated
by the end of the decode stage (at least, this is the way it works on
the Am29000).

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)