Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rochester!pt.cs.cmu.edu!aw.sei.cmu.edu!bd.sei.cmu.edu!firth
From: firth@bd.sei.cmu.edu (Robert Firth)
Newsgroups: net.arch
Subject: RISC delayed branch
Message-ID: <292@bd.sei.cmu.edu>
Date: Tue, 12-Aug-86 11:34:22 EDT
Article-I.D.: bd.292
Posted: Tue Aug 12 11:34:22 1986
Date-Received: Wed, 13-Aug-86 01:20:20 EDT
Organization: Carnegie-Mellon University, SEI
Lines: 48

Can an Assembler do as well as a Compiler in
moving code to fll the Noop after a delayed branch?

When I tried it that way, the answer was "almost
as well".  Actually, what I used was a typical
"moving window" peephole optimisation pass over
the generated assembler code, but the assembler
itself could do that too.

To a rough approximation, there is about a 40%
chance that two adjacent instructions are independent,
ie can be permuted, assuming neither is a conditional
branch.  What we want to do is change

	ACTION
	BRANCH

into

	BRANCH
	ACTION

and also

	ACTION
	TEST
	COND BRANCH

into

	TEST
	COND BRANCH
	ACTION

Well, with a window 4 instructions wide, and the
above 40% rule, you can permute something into the
noop slot about 82% of the time, which is pretty
good.

The one significant case I found where a compiler
could do better was when a loop could be rotated
to fill the noop, ie an ACTION brought down from
the top of iteration N+1 to fill the hole after the
branch back from iteration N.  Not easy for an Assembler,
but not very hard for a compiler using a graph form
of the code.

Robert Firth