Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!ji.Berkeley.EDU!shebanow From: shebanow@ji.Berkeley.EDU (Mike Shebanow) Newsgroups: comp.arch Subject: Re: Branch prediction in the 532 Message-ID: <18537@ucbvax.BERKELEY.EDU> Date: Thu, 23-Apr-87 11:27:32 EST Article-I.D.: ucbvax.18537 Posted: Thu Apr 23 11:27:32 1987 Date-Received: Sat, 25-Apr-87 10:00:45 EST References: <324@dumbo.UUCP> <165100008@uiucdcsb> Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: shebanow@ji.Berkeley.EDU.UUCP (Mike Shebanow) Distribution: world Organization: University of California, Berkeley Lines: 70 Keywords: branch prediction, branch target buffers, BTB Summary: Lots of good papers... (long) In article <165100008@uiucdcsb> Arch Robison writes: > >Are there any good papers on branch prediction? I've seen two schemes: > > 1) Put an extra bit in the instruction indicating which way the branch > will probably go. This is a compile-time prediction. > > 2) Keep a record of past branches, i.e. run-time prediction. > >Though I've heard of examples of machines with each scheme, I've never >seen any studies of their efficiency. There have been many good papers on the subject. Here is a partial bibliography: (Note: ISCA = International Symposium on Computer Architecture, IBM TDB = IBM Technical Disclosure Bulletin) [1] S. McFarling and J. Hennesy, "Reducing the Cost of Branches," Proc. 13th ISCA, June 1986, pp. 396-403. [2] J. K. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Designs," IEEE Computer, vol. 17, #1, Jan 1984. [3] J. E. Smith, "A Study of Branch Prediction Strategies," 8th ISCA, May 1981, pp. 135-148. [4] J. Losq, "Generalized History Table for Branch Prediction," IBM TDB, vol 25, #1, June 1982, pp. 99-101. [5] A. Liles Jr. and B. E. Wilner, "Branch Prediction Mechanism." IBM TDB, vol 22, #7, Dec 1979, pp. 3013-3016. [6] G. Rao, "Technique for Minimizing Branch Delay Due to Incorrect Branch History Table Predictions," IBM TDB, vol 25, #1, June 1982, pp. 97-98. Based on the results published, the best prediction algorithms use run-time information. As developed by IBM, a branch target buffer is kept in the machine (BTB). The BTB is an associative table (direct mapped, n-way set associative, fully associative, whatever) which uses the address of the BRANCH as an index. Associated with the branch address are two pieces of information (usually): - target address of the branch (where to go if branch taken -- OR, perhaps the first few instructions at the target address (eg., AM29000)) - some prediction information (see below). The type of prediction information used depends on the algorithm. A simple scheme which works well is to use a 2 bit counter. The counter is incremented (until it hits the value 3) when the branch is taken, and decremented (until it hits the value 0) when the branch is not taken. When a branch is encountered, it is predicted taken if the counter value is 2 or 3, and predicted not taken if the counter value is 0 or 1. Lee and Smith describe a more complex scheme in their paper which provides even better accuracy. Using these runtime prediction algorithms, accuracies better than 90% have been achieved. With the opcode method National is using, accuracies in the 70-80% are usual (as they report). I want to point out that accuracy is a poor indicator of performance. A much better measure (for any particular architecture) is instruction block size (IBS). IBS is defined to be the average number of instructions executed between branch prediction mistakes. If we have a pipeline of depth 'd', then an !APPROXIMATE! measure of the efficiency of this pipe (in the presence of branch stalls, ignoring other types of stalls) is (IBS/(IBS + d)). For small pipe depths, IBS tends not to matter. For deep pipes though, a small value for IBS will kill you. I would be curious if anyone on comp.arch has these numbers. Mike Shebanow