Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!uwvax!astroatc!prairie!dan
From: dan@prairie.UUCP (Daniel M. Frank)
Newsgroups: comp.sys.intel
Subject: Re: Code prefetch
Message-ID: <361@prairie.UUCP>
Date: Fri, 21-Nov-86 11:20:17 EST
Article-I.D.: prairie.361
Posted: Fri Nov 21 11:20:17 1986
Date-Received: Fri, 21-Nov-86 20:35:27 EST
References: <242@bobkat.UUCP>
Reply-To: dan@prairie.UUCP (Daniel M. Frank)
Distribution: na
Organization: Prairie Computing, Madison, Wisconsin
Lines: 36
Keywords: code prefetch compiler basic-blocks

In article <242@bobkat.UUCP> m5d@bobkat.UUCP (Mr Mike McNally) writes:
>The obvious problem is that branch instructions do not cooperate
>with the idea of prefetching.  Attempts to follow different paths
>beyond a branch seem like Koyaanisqatsi solutions to me.

   Many pipelined machines (for about the last 20 years) have had
two prefetch queues, and fetched down both branch paths for a while
until the branch was resolved.  This is a good solution because there
may be several instructions ahead of the branch in the pipeline, so
you have no idea whether the branch is going to be taken or not.  Also,
prefetch queues are seldom that long, so the case where a branch
is followed in the prefetch by another branch is infrequent enough
not to be costly (usually it just stalls fetching for a while).
   
   The idea has been broached (and I think used in some experimental
processors) of a "branch usually" or "branch seldom" instruction, in
which the compiler indicates to the processor which branch is most
likely to be taken.  If branch predictions are relatively accurate,
you only pay for the branch in the occasional case when it goes
against the prediction.  The question of how to make accurate predictions
is a current research topic.

   Of course, this is all complicated when the branch target address
is being computed when the branch instruction hits the decode pipeline
step.  This usually happens when the target address is in a register.
Instruction fetch may have to come to a halt until the interlock can
be resolved.  This problem can be reduced somewhat by careful coding.

   One processor under development at the UW actually EXECUTES down
both branch paths (I won't say any more about that, as a paper is in
the works by the folks developing it).

-- 
    Dan Frank
    uucp: ... uwvax!prairie!dan
    arpa: dan%caseus@spool.wisc.edu