Xref: utzoo comp.arch:8564 comp.lang.misc:2702 comp.lang.c:16665 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!bbn!oberon!orion.cf.uci.edu!ucsd!chem.ucsd.edu!tps From: tps@chem.ucsd.edu (Tom Stockfisch) Newsgroups: comp.arch,comp.lang.misc,comp.lang.c Subject: Re: Peephole optimisation Message-ID: <420@chem.ucsd.EDU> Date: 2 Mar 89 23:35:24 GMT References: <740@tetons.UUCP> <76700068@p.cs.uiuc.edu> <671@oracle.oracle.com> <730@microsoft.UUCP> <1153@l.cc.purdue.edu> <8650@aw.sei.cmu.edu> <11201@eddie.MIT.EDU> Reply-To: tps@chem.ucsd.edu (Tom Stockfisch) Organization: Chemistry Dept, UC San Diego Lines: 56 In article <11201@eddie.MIT.EDU> jbs@fenchurch.UUCP (Jeff Siegal) writes: >>cik@l.cc.purdue.edu (Herman Rubin) writes: >> [...]And how big is the peephole? >Every study I've seen claims that three instructions is very nearly >always sufficient.... >For RISC machines, this is not likely to be an issue, and three >instructions would (probably) suffice. I don't think three instructions is enough if your RISC machine has a lot of parallelization and scheduling is considered. For instance, our Celerity has two integer independent processors that share one memory bus. A fetch ("f") takes 4 cycles if the bus isn't being used by the other processor, and up to 8 cycles if it is. If the results of the fetch are not needed immediately a processor can keep executing while it waits for the bus. Thus, a sequence such as f r1, r1 # fetch from address in r1 rcv r1 # put result in r1 -- really part of fetch instruc bxor r1, r5 # use the result in another operation swll r2, r2 # do something unrelated to r1 swll r3, r3 # ditto swll r4, r4 # ditto runs much faster if written as f r1, r1 rcv r1 swll r2, r2 # the "swll"s now execute if 2nd processor swll r3, r3 # is hogging memory bus swll r4, r4 bxor r1, r5 # no harm waiting till now to do this Our peephole optimizer appears to have a 3 instruction window and will change the first sequence to f r1, r1 rcv r1 swll r2, r2 # too bad only one "swll" gets moved bxor r1, r5 swll r3, r3 swll r4, r4 I have hand-moved instruction sequences like this and gotten considerable speed improvement when more than one process is executing on the machine. >Multiple passes can often be used to get the same effect as a longer >peephole (by using intermediate transformations). I don't think that would help here. In fact, to do the best job of scheduling an extremely large window can be necessary. Perhaps some specialized scheduler pass would be more practical. -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu