Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!well!nagle From: nagle@well.UUCP (John Nagle) Newsgroups: comp.protocols.tcp-ip Subject: Re: Nagle algo. in Unix-TCP Keywords: tcp tinygram RPC X11 Message-ID: <13394@well.UUCP> Date: 29 Aug 89 16:18:35 GMT References: <2581@lll-lcc.UUCP> Reply-To: nagle@well.UUCP (John Nagle) Lines: 188 In article <2581@lll-lcc.UUCP> rwolski@lll-lcc.UUCP (Richard Wolsoi) writes: > >My question regards the Nagle algorithms for small-packet avoidance, which >have been implemented in the various flavors of UNIX now running around. > >A colleague of mine has written an RPC mechanism to run over TCP sockets >on UNIX systems and we are seeing some very strange performance numbers >for certain kinds of messaging. An ethernet trace and several minutes with >the source code convinced us that TCP was delaying both data sends and >acknowledgements in an effort to avoid silly windows. > >Part of the$problem comes from the RPC implementation which makes two >send system calls for each request or reply (no flames please we are dealing >with some serious history here) in that UNIX sends out the information in >two different packets. The first goes out immediately (as the Nagle >algorithm prescribes) while the second is delayed until the first is acked. >Unfortunately, the ack is delayed as part of the receiver's half of the >bargain and so we were seeing a whopping 10 packets per second. >Now for my question. Is there any way to defeat the Nagle algorithms under >standard implementations? I seem to recall that the Tahoe (or is it Tajo) >release of 4.3 had such a defeat wlich was put in for X-11, but we don't >have systems which are quite so modern. Specifically, we are using SunOS >3.4, Ultrix 3.0 and UTS something-or-other with WIN-UTS. What seems to have happened here is that several mechanisms in TCP are interacting with a strange kind of application to produce poor performance. First, "silly window syndrome" is irrelevant here. Silly window syndrome occurs when the window is full most of the time, but here, we have the tinygram problem, which occurs when the window is empty most of the time. It's a common misconception that the two problems are the same. They are not. They are handled by separate code in the UNIX implementation, incidentally. The problem here seems to be that tinygram elimination and delayed ACKs, both performance improvements in TCP, are interacting with this new RPC application. Delayed ACKs are something that first appeared in TOPS-20, which is a system that runs TELNET in remote echo most of the time. (So do most UNIX TELNET implementations, not that they really need to.) The idea there was to make the bet that when a packet came in on a TCP connection, the application would probably have something to reply with shortly. Therefore, TCP was gimmicked to delay sending an ACK for about 100ms, in hopes that the application layer would send something back and that this would be piggybacked on the application layer's reply. Note that this is an assumption built into the transport layer about the behavior of the application layer. Delayed ACKs will cut traffic in half on slow TELNET operations, and they don't bother FTP. So they seemed like a big win at the time, when there were few TCP applications beyond TELNET, FTP, and mail. TCP with delayed ACKs and tinygram elimination sending will support the following kinds of applications well. 1) Big data pipes, like FTP. 2) TELNET-type interaction. 3) Request-reply type transaction protocols. But here, we have someone who is trying something that has the property that it does SEND SEND wait for reply. This doesn't take turns the way a typical transaction protocol does, so the guesses built into TCP are bad for this situation. This, this sort of use creates problems. Of course, as the writer points out, doing multiple tiny sends is a bad thing on general principles. It's always better to do one big send than lots of little ones, given that you're not waiting anxiously for an answer. Sending a 1-byte message takes 41 bytes across the net, plus any overhead at the link layer. It's that 40:1 expansion that led to the need for tinygram elimination. Presumably this RPC package is sending a bit more data at a time, so the expansion factor may be smaller, but it may still be significant. One solution might be to make whatever RPC package he's using go through the standard UNIX I/O library (stdio) and flush the output stream just before reading from the input stream or before reading from another source. This would improve the buffering situation. This is really a buffering problem, after all. TCP is trying to protect the network from dumb applications. We fixed it back in 1985 so that when the application is dumb, the application suffers, not the network. We have here a dumb application layer. John Nagle Newsgroups: poster Subject: Re: Nagle algo. in Unix-TCP Summary: Expires: References: <2581@lll-lcc.UUCP> Sender: Reply-To: nagle@well.UUCP (John Nagle) Followup-To: Distribution: Keywords: In article <2581@lll-lcc.UUCP> rwolski@lll-lcc.UUCP (Richard Wolsoi) writes: > >My question regards the Nagle algorithms for small-packet avoidance, which >have been implemented in the various flavors of UNIX now running around. > >A colleague of mine has written an RPC mechanism to run over TCP sockets >on UNIX systems and we are seeing some very strange performance numbers >for certain kinds of messaging. An ethernet trace and several minutes with >the source code convinced us that TCP was delaying both data sends and >acknowledgements in an effort to avoid silly windows. > >Part of the$problem comes from the RPC implementation which makes two >send system calls for each request or reply (no flames please we are dealing >with some serious history here) in that UNIX sends out the information in >two different packets. The first goes out immediately (as the Nagle >algorithm prescribes) while the second is delayed until the first is acked. >Unfortunately, the ack is delayed as part of the receiver's half of the >bargain and so we were seeing a whopping 10 packets per second. >Now for my question. Is there any way to defeat the Nagle algorithms under >standard implementations? I seem to recall that the Tahoe (or is it Tajo) >release of 4.3 had such a defeat wlich was put in for X-11, but we don't >have systems which are quite so modern. Specifically, we are using SunOS >3.4, Ultrix 3.0 and UTS something-or-other with WIN-UTS. What seems to have happened here is that several mechanisms in TCP are interacting with a strange kind of application to produce poor performance. First, "silly window syndrome" is irrelevant here. Silly window syndrome occurs when the window is full most of the time, but here, we have the tinygram problem, which occurs when the window is empty most of the time. It's a common misconception that the two problems are the same. They are not. They are handled by separate code in the UNIX implementation, incidentally. The problem here seems to be that tinygram elimination and delayed ACKs, both performance improvements in TCP, are interacting with this new RPC application. Delayed ACKs are something that first appeared in TOPS-20, which is a system that runs TELNET in remote echo most of the time. (So do most UNIX TELNET implementations, not that they really need to.) The idea there was to make the bet that when a packet came in on a TCP connection, the application would probably have something to reply with shortly. Therefore, TCP was gimmicked to delay sending an ACK for about 100ms, in hopes that the application layer would send something back and that this would be piggybacked on the application layer's reply. Note that this is an assumption built into the transport layer about the behavior of the application layer. Delayed ACKs will cut traffic in half on slow TELNET operations, and they don't bother FTP. So they seemed like a big win at the time, when there were few TCP applications beyond TELNET, FTP, and mail. TCP with delayed ACKs and tinygram elimination sending will support the following kinds of applications well. 1) Big data pipes, like FTP. 2) TELNET-type interaction. 3) Request-reply type transaction protocols. But here, we have someone who is trying something that has the property that it does SEND SEND wait for reply. This doesn't take turns the way a typical transaction protocol does, so the guesses built into TCP are bad for this situation. This, this sort of use creates problems. Of course, as the writer points out, doing multiple tiny sends is a bad thing on general principles. It's always better to do one big send than lots of little ones, given that you're not waiting anxiously for an answer. Sending a 1-byte message takes 41 bytes across the net, plus any overhead at the link layer. It's that 40:1 expansion that led to the need for tinygram elimination. Presumably this RPC package is sending a bit more data at a time, so the expansion factor may be smaller, but it may still be significant. One solution might be to make whatever RPC package he's using go through the standard UNIX I/O library (stdio) and flush the output stream just before reading from the input stream or before reading from another source. This would improve the buffering situation. This is really a buffering problem, after all. TCP is trying to protect the network from dumb applications. We fixed it back in 1985 so that when the application is dumb, the application suffers, not the network. We have here a dumb application layer. John Nagle