Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!bbn!bbn.com!slackey From: slackey@bbn.com (Stan Lackey) Newsgroups: comp.arch Subject: Re: Filling branch delay slot with test Message-ID: <45219@bbn.COM> Date: 5 Sep 89 17:23:25 GMT References: <1432@atanasoff.cs.iastate.edu> <26859@winchester.mips.COM> <1437@atanasoff.cs.iastate.edu> Sender: news@bbn.COM Reply-To: slackey@BBN.COM (Stan Lackey) Distribution: na Organization: Bolt Beranek and Newman Inc., Cambridge MA Lines: 38 In article <1437@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: >In article <26859> mash@mips.COM (John Mashey) writes: >}In article <1432> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: >}> AGAIN: JSUB FOO_RTN ; return FOO in R0 >}> BEQL AGAIN ; try again if we >}> TEST R0 ; get zero back >}> Although not without complications, it would seem an >}> excellent way to have a high branch delay slot fill ratio. >}Put another way: as much as computer architects would like >}pipestages whose results are available in advance of their execution, >}such things are only found in science-fiction...... > No. What I was alluding to was "starting down both paths" of the > branch and then "dumping the loser". Another way is to use branch prediction; guess at the direction using some algorithm (they range from "terrible" to "pretty good") and start fetching instr's and operands. At least you only prefetch one path (single instruction cache port, single instruction decoder, etc). You need to be careful about doing things that are hard to undo when you turn out to be wrong, though. Other than in the Multiflow Trace, all the algorithms that I know about have been implemented in hardware. I wonder if there have been studies done (or implementations?) of a more conventional architecture where the branch instruction has information in it (inserted by the compiler, possibly using runtime statistics) to tell the hardware which way to predict the branch. Most of the hardware algorithms work well for inloop situations, where prediction is done either by looking at the direction of the branch to keep execution in the loop, or by caching recently-executed branches and using history analysis. I am wondering about cases like a long piece of code (such as a system call) where state is tested to control flow, and very little is inloop. Of course, lots of machines which depend on an instruction cache are going to perform dismally on this type of code anyway. -Stan