Xref: utzoo comp.arch:8564 comp.lang.misc:2702 comp.lang.c:16665
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!bbn!oberon!orion.cf.uci.edu!ucsd!chem.ucsd.edu!tps
From: tps@chem.ucsd.edu (Tom Stockfisch)
Newsgroups: comp.arch,comp.lang.misc,comp.lang.c
Subject: Re: Peephole optimisation
Message-ID: <420@chem.ucsd.EDU>
Date: 2 Mar 89 23:35:24 GMT
References: <740@tetons.UUCP> <76700068@p.cs.uiuc.edu> <671@oracle.oracle.com> <730@microsoft.UUCP> <1153@l.cc.purdue.edu> <8650@aw.sei.cmu.edu> <11201@eddie.MIT.EDU>
Reply-To: tps@chem.ucsd.edu (Tom Stockfisch)
Organization: Chemistry Dept, UC San Diego
Lines: 56

In article <11201@eddie.MIT.EDU> jbs@fenchurch.UUCP (Jeff Siegal) writes:
>>cik@l.cc.purdue.edu (Herman Rubin) writes:
>>   [...]And how big is the peephole?

>Every study I've seen claims that three instructions is very nearly
>always sufficient....
>For RISC machines, this is not likely to be an issue, and three
>instructions would (probably) suffice.

I don't think three instructions is enough if your RISC machine has
a lot of parallelization and scheduling is considered.
For instance, our Celerity has two integer
independent processors that share one memory bus.  A fetch ("f") takes
4 cycles if the bus isn't being used by the other processor, and up to
8 cycles if it is.  If the results of the fetch are not needed immediately
a processor can keep executing while it waits for the bus.  Thus, a sequence
such as

	f r1, r1	# fetch from address in r1
	rcv r1		# put result in r1 -- really part of fetch instruc
	bxor r1, r5	# use the result in another operation
	swll r2, r2	# do something unrelated to r1
	swll r3, r3	# ditto
	swll r4, r4	# ditto

runs much faster if written as

	f r1, r1
	rcv r1
	swll r2, r2	# the "swll"s now execute if 2nd processor
	swll r3, r3	# is hogging memory bus
	swll r4, r4
	bxor r1, r5	# no harm waiting till now to do this

Our peephole optimizer appears to have a 3 instruction window and will
change the first sequence to

	f r1, r1
	rcv r1
	swll r2, r2	# too bad only one "swll" gets moved
	bxor r1, r5
	swll r3, r3
	swll r4, r4

I have hand-moved instruction sequences like this and gotten considerable
speed improvement when more than one process is executing on the machine.

>Multiple passes can often be used to get the same effect as a longer
>peephole (by using intermediate transformations).

I don't think that would help here.  In fact, to do the best job of
scheduling an extremely large window can be necessary.  Perhaps some
specialized scheduler pass would be more practical.
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu