Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!amdcad!ames!sdcsvax!ucbvax!utzoo.UUCP!henry From: henry@utzoo.UUCP Newsgroups: comp.protocols.tcp-ip Subject: Re: TCP performance limitations Message-ID: <8710230618.AA23418@ucbvax.Berkeley.EDU> Date: Fri, 23-Oct-87 01:43:18 EST Article-I.D.: ucbvax.8710230618.AA23418 Posted: Fri Oct 23 01:43:18 1987 Date-Received: Sun, 25-Oct-87 13:38:40 EST References: <1218@nrcvax.UUCP> <871013100846.2.DCP@KOYAANISQATSI.S4CC.Symbolics.COM> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 19 > ... Fourth, depending on your CPU architecture, there > may be ideal unrolling constants which would keep the unrolled loop > inside an instruction prefetch buffer; complete unrolling would actually > be a degredation. In particular, almost any CPU with a cache -- which means most anything above the PC level nowadays -- will have an optimum degree of unrolling for loops that iterate a given number of times. It's not just a question of whether the loop will fit; eventually the extra main-memory fetches needed to get a larger loop into the cache wipe out the gains from reduced loop-control overhead. For straightforward caches (with a loop that will *fit* in the cache!), elapsed time versus degree of unrolling is a nice smooth curve with a quite marked minimum. Based on the look I took at this, if the ratio of your cache speed to memory speed isn't striking, and your loop control is not grossly costly (due to e.g. pipeline breaks), the minimum has a good chance of falling at a fairly modest unrolling factor, maybe 8 or 16. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry