Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!decvax!ucbvax!QUABBIN.SCRC.SYMBOLICS.COM!DCP From: DCP@QUABBIN.SCRC.SYMBOLICS.COM (David C. Plummer) Newsgroups: comp.protocols.tcp-ip Subject: Re: TCP performance limitations Message-ID: <871013100846.2.DCP@KOYAANISQATSI.S4CC.Symbolics.COM> Date: Tue, 13-Oct-87 10:08:00 EDT Article-I.D.: KOYAANIS.871013100846.2.DCP Posted: Tue Oct 13 10:08:00 1987 Date-Received: Thu, 15-Oct-87 00:46:15 EDT References: <1218@nrcvax.UUCP> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 33 Date: 8 Oct 87 16:05:47 GMT From: csustan!csun!psivax!nrcvax!ihm@LLL-WINKEN.ARPA (Ian H. Merritt) >There is a fourth way that we (Symbolics) have done which you did not >mentioned: > >(a) Pick a compile-time unrolling factor, usually a power of 2, say 16 = 2^4. >(b) Divide the data length by the unrolling factor, obtaining a quotient > and remainder. When the unrolling factor is a power of two, the > quotient is a shift and the remainder is a logical AND. >(c) Write a unrolled loop whose length is the unrolling factor. Execute > this loop times. >(d) Write an un-unrolled loop (whose length is therefore 1). Execute > this loop times. Or if you have memory to burn (which is fast becoming a common condition), just unroll the loop for the maximum condition and branch into it at the appropriate point to process the length of the actual packet. First of all, that's 65535 octets for TCP. Second, I believe that was one of the three techniques metioned by the person to whom I was replying. Third, we (Symbolics) can't do that without playing some really nasty games with the compiler. You see, we're of the opinion that assembly language is a thing of the past, and there aren't any good Lisp constructs for the kind of computed GO necessary to pull this trick off. I can't think of any good tricks in FORTRAN, either. I'm not familiar with Pascal, Ada or C to know if those higher-level languages allow such things. Fourth, depending on your CPU architecture, there may be ideal unrolling constants which would keep the unrolled loop inside an instruction prefetch buffer; complete unrolling would actually be a degredation.