Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!ji.Berkeley.EDU!shebanow
From: shebanow@ji.Berkeley.EDU (Mike Shebanow)
Newsgroups: comp.arch
Subject: Re: Branch prediction in the 532
Message-ID: <18537@ucbvax.BERKELEY.EDU>
Date: Thu, 23-Apr-87 11:27:32 EST
Article-I.D.: ucbvax.18537
Posted: Thu Apr 23 11:27:32 1987
Date-Received: Sat, 25-Apr-87 10:00:45 EST
References: <324@dumbo.UUCP> <165100008@uiucdcsb>
Sender: usenet@ucbvax.BERKELEY.EDU
Reply-To: shebanow@ji.Berkeley.EDU.UUCP (Mike Shebanow)
Distribution: world
Organization: University of California, Berkeley
Lines: 70
Keywords: branch prediction, branch target buffers, BTB
Summary: Lots of good papers... (long)

In article <165100008@uiucdcsb> Arch Robison writes:
>
>Are there any good papers on branch prediction?  I've seen two schemes:
>
>	1) Put an extra bit in the instruction indicating which way the branch
>	   will probably go.  This is a compile-time prediction.
>
>	2) Keep a record of past branches, i.e. run-time prediction.   
>
>Though I've heard of examples of machines with each scheme, I've never
>seen any studies of their efficiency.

There have been many good papers on the subject. Here is a partial
bibliography:

	(Note: ISCA = International Symposium on Computer Architecture,
	IBM TDB = IBM Technical Disclosure Bulletin)

[1] S. McFarling and J. Hennesy, "Reducing the Cost of Branches," Proc.
    13th ISCA, June 1986, pp. 396-403.

[2] J. K. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target
    Buffer Designs," IEEE Computer, vol. 17, #1, Jan 1984.

[3] J. E. Smith, "A Study of Branch Prediction Strategies," 8th ISCA, May 1981,
    pp. 135-148.

[4] J. Losq, "Generalized History Table for Branch Prediction," IBM TDB,
    vol 25, #1, June 1982, pp. 99-101.

[5] A. Liles Jr. and B. E. Wilner, "Branch Prediction Mechanism." IBM TDB,
    vol 22, #7, Dec 1979, pp. 3013-3016.

[6] G. Rao, "Technique for Minimizing Branch Delay Due to Incorrect Branch
    History Table Predictions," IBM TDB, vol 25, #1, June 1982, pp. 97-98.

Based on the results published, the best prediction algorithms use run-time
information. As developed by IBM, a branch target buffer is kept in the
machine (BTB). The BTB is an associative table (direct mapped, n-way set
associative, fully associative, whatever) which uses the address of the
BRANCH as an index. Associated with the branch address are two pieces
of information (usually):

	- target address of the branch (where to go if branch taken -- OR,
	    perhaps the first few instructions at the target address (eg.,
	    AM29000))
	- some prediction information (see below).

The type of prediction information used depends on the algorithm. A simple
scheme which works well is to use a 2 bit counter. The counter is incremented
(until it hits the value 3) when the branch is taken, and decremented (until
it hits the value 0) when the branch is not taken. When a branch is encountered,
it is predicted taken if the counter value is 2 or 3, and predicted not taken
if the counter value is 0 or 1. Lee and Smith describe a more complex scheme
in their paper which provides even better accuracy. Using these runtime
prediction algorithms, accuracies better than 90% have been achieved. With
the opcode method National is using, accuracies in the 70-80% are usual (as
they report).

I want to point out that accuracy is a poor indicator of performance. A much
better measure (for any particular architecture) is instruction block size
(IBS). IBS is defined to be the average number of instructions executed
between branch prediction mistakes. If we have a pipeline of depth 'd',
then an !APPROXIMATE! measure of the efficiency of this pipe (in the presence
of branch stalls, ignoring other types of stalls) is (IBS/(IBS + d)). For
small pipe depths, IBS tends not to matter. For deep pipes though, a small
value for IBS will kill you. I would be curious if anyone on comp.arch 
has these numbers.

Mike Shebanow