Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!julius.cs.uiuc.edu!apple!amdcad!weitek!jetsun!gg
From: gg@jetsun.weitek.COM
Newsgroups: comp.arch
Subject: loop unrolling (was:Re: Register Count)
Message-ID: <1991Jan14.215401.19522@jetsun.weitek.COM>
Date: 14 Jan 91 21:54:01 GMT
References: <PCG.91Jan10162301@odin.cs.aber.ac.uk> <11566@pt.cs.cmu.edu> <PCG.91Jan13174042@odin.cs.aber.ac.uk>
Reply-To: gg@WEITEK.COM ()
Organization: WEITEK, Sunnyvale CA
Lines: 15

In article <PCG.91Jan13174042@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>
>If you have *some limited* degree of pipelining, as in contemporary
>implementations, such as the classic three-four stage pipeline that
>overlaps some computation with some control, and especially if this
>pipeline is exposed with things like delayed branches, then unrolling
>buys you nothing at all in time, and loses code space.
>

On the contrary: it can give you bigger basic blocks in the critical loops, thus
making more room for instruction scheduling to minimize delays.

A different problem with loop unrolling is when you have an instruction cache:
if the unrolled loop code size exceeds the size of the instruction cache (and the
rolled loop fits in it), then your cache miss rate will increase for that loop.