Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!uwvax!astroatc!prairie!dan From: dan@prairie.UUCP (Daniel M. Frank) Newsgroups: comp.sys.intel Subject: Re: Code prefetch Message-ID: <361@prairie.UUCP> Date: Fri, 21-Nov-86 11:20:17 EST Article-I.D.: prairie.361 Posted: Fri Nov 21 11:20:17 1986 Date-Received: Fri, 21-Nov-86 20:35:27 EST References: <242@bobkat.UUCP> Reply-To: dan@prairie.UUCP (Daniel M. Frank) Distribution: na Organization: Prairie Computing, Madison, Wisconsin Lines: 36 Keywords: code prefetch compiler basic-blocks In article <242@bobkat.UUCP> m5d@bobkat.UUCP (Mr Mike McNally) writes: >The obvious problem is that branch instructions do not cooperate >with the idea of prefetching. Attempts to follow different paths >beyond a branch seem like Koyaanisqatsi solutions to me. Many pipelined machines (for about the last 20 years) have had two prefetch queues, and fetched down both branch paths for a while until the branch was resolved. This is a good solution because there may be several instructions ahead of the branch in the pipeline, so you have no idea whether the branch is going to be taken or not. Also, prefetch queues are seldom that long, so the case where a branch is followed in the prefetch by another branch is infrequent enough not to be costly (usually it just stalls fetching for a while). The idea has been broached (and I think used in some experimental processors) of a "branch usually" or "branch seldom" instruction, in which the compiler indicates to the processor which branch is most likely to be taken. If branch predictions are relatively accurate, you only pay for the branch in the occasional case when it goes against the prediction. The question of how to make accurate predictions is a current research topic. Of course, this is all complicated when the branch target address is being computed when the branch instruction hits the decode pipeline step. This usually happens when the target address is in a register. Instruction fetch may have to come to a halt until the interlock can be resolved. This problem can be reduced somewhat by careful coding. One processor under development at the UW actually EXECUTES down both branch paths (I won't say any more about that, as a paper is in the works by the folks developing it). -- Dan Frank uucp: ... uwvax!prairie!dan arpa: dan%caseus@spool.wisc.edu