Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!julius.cs.uiuc.edu!apple!amdcad!weitek!jetsun!gg From: gg@jetsun.weitek.COM Newsgroups: comp.arch Subject: loop unrolling (was:Re: Register Count) Message-ID: <1991Jan14.215401.19522@jetsun.weitek.COM> Date: 14 Jan 91 21:54:01 GMT References: <11566@pt.cs.cmu.edu> Reply-To: gg@WEITEK.COM () Organization: WEITEK, Sunnyvale CA Lines: 15 In article pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > >If you have *some limited* degree of pipelining, as in contemporary >implementations, such as the classic three-four stage pipeline that >overlaps some computation with some control, and especially if this >pipeline is exposed with things like delayed branches, then unrolling >buys you nothing at all in time, and loses code space. > On the contrary: it can give you bigger basic blocks in the critical loops, thus making more room for instruction scheduling to minimize delays. A different problem with loop unrolling is when you have an instruction cache: if the unrolled loop code size exceeds the size of the instruction cache (and the rolled loop fits in it), then your cache miss rate will increase for that loop.