Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!mailrus!csd4.milw.wisc.edu!cs.utexas.edu!oakhill!marvin
From: marvin@oakhill.UUCP (Marvin Denman)
Newsgroups: comp.arch
Subject: Re: delayed branch
Message-ID: <2284@yogi.oakhill.UUCP>
Date: 9 Aug 89 15:07:00 GMT
References: <828@eutrc3.urc.tue.nl> <26667@amdcad.AMD.COM>
Reply-To: cs.utexas.edu!oakhill!marvin (Marvin Denman)
Organization: Motorola Inc., Austin Tx.
Lines: 51

In article <26667@amdcad.AMD.COM> tim@amd.com writes:
 >In article <828@eutrc3.urc.tue.nl> rcpieter@rc4.urc.tue.nl writes:
 >| Just wondering---
 >|  - What happens on existing processors which use delayed branches when the
 >| instruction put in the branch instruction's shadow is also a branch?

 >You get what we term a "visit".  This executes the single instruction
 >at the first branch target, then continues with the second (delay-slot)
 >branch's target.  The second branch does not have a physical delay slot
 >following it; rather, the instruction at the first branch's target acts
 >as the second branch's delay slot.

This is a good explanation of what occurs with delayed branches.  The 88100 
has a few more variations of this effect due to the ability of the instruction 
set to either execute the instruction in the delay slot or to waive the
instruction in the delay slot.  This for instance allows a "visit" that does
not execute the instruction.  Some of the other variations of execute/no 
execute may produce other obscure code sequencing.  Before using any of 
these tricks see the warning below.

 >|  - Do existing processors have a seperate adder for use by branches only, or
 >| are there restrictions on the possible instructions which can be put in the
 >| shadow?
 >|  - Are there any processors which use branch delaying and don't have a
 >| seperate address calculation adder?
 >
 >I know of no processors that have delayed branches and restrictions on
 >delay instruction types.  However, there really isn't any resource
 >contention between a branch and its delay slot.  The delay slot is just
 >being decoded by the time the branch executes.  The reason that a
 >processor may have a separate adder to calculate PC-relative branch
 >offset addresses is that they must perform an instruction cache lookup
 >during the branch execution, and thus must have the address calculated
 >by the end of the decode stage (at least, this is the way it works on
 >the Am29000).
 >
 >-- Tim Olson
 >Advanced Micro Devices
 >(tim@amd.com)

The 88100 does have a separate adder for branches like most other chips with
delayed branching, but the User's Manual explicitly states that for future
compatibility the delay slot instruction cannot be a trap, jump, branch, or
any other instruction that modifies the instruction pointer.  In other words
this behavior, while it may be useful for at least one special case 
application, is not guaranteed to work the same on all implementations of the
architecture.  This allows the possibility of different pipelining schemes in
the future without unduly tying the architect's hands.

Marvin Denman
Motorola 88000 Design