Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!iuvax!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: delayed branch (& delayed loads!) Message-ID: <12465@pur-ee.UUCP> Date: 3 Aug 89 14:41:28 GMT References: <2246@taux01.UUCP> <1462@l.cc.purdue.edu> <26139@shemp.CS.UCLA.EDU> <33669@apple.Apple.COM> Reply-To: hankd@pur-ee.UUCP (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 29 In article <33669@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes: >An interesting point to ponder is that delayed loads also have to schedule >something into their shadows, and they are (very roughly) as frequent as >branches. Now, there are only so many instructions that can be re-arranged, >so it is possible that although branches shadows can be filled 70% of the time, >and load shadows can be filled 70% of the time, it may not be true that you can >fill both of the 70% of the time. Any comments from someone who has actually >measured this? It might be interesting to turn off filling of one, and see how >the percentage of filling the other increases. I think delayed loads are MORE frequent than delayed branches for most code and most target machines, however, the two are essentially independent pipelines. In other words, for a lot of target machines, one can use a delayed branch in a delay slot for a delayed load (and vice versa). Hence, with appropriate compiler algorithms, the two don't interfere that much. I have a student, Ashar Nissar, who is doing his MS thesis on compile-time scheduling for multiple pipelines. It's a little early to give results, and number results are far too dependent on the precise target machine, but qualitatively it is safe to say that having multiple INDEPENDENT pipelines doesn't seem to cause much trouble. Notice that I didn't say that the usual pipeline scheduling techniques work all that well... we are currently using an algorithm based on trying ALL legal & useful permutations of instruction sequences prior to register allocation. Right now, we can quickly get an optimal schedule for up to about 16 instructions, but have to truncate the search for larger schedules. -hankd@ecn.purdue.edu