Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site petrus.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!whuxlm!whuxl!houxm!ihnp4!mhuxn!mhuxm!sftig!sftri!sfmag!eagle!ulysses!allegra!bellcore!petrus!karn From: karn@petrus.UUCP Newsgroups: net.unix-wizards,net.lan Subject: Re: KEEPALIVE's do not always work. Message-ID: <363@petrus.UUCP> Date: Fri, 31-May-85 21:49:54 EDT Article-I.D.: petrus.363 Posted: Fri May 31 21:49:54 1985 Date-Received: Sun, 2-Jun-85 02:50:01 EDT References: <1284@hammer.UUCP> Organization: Bell Communications Research, Inc Lines: 63 Xref: watmath net.unix-wizards:13378 net.lan:833 The fundamental problem with TCP keepalives in 4.2BSD, as Jon Postel said, is that they are a "braindamaged hack" (his words). According to the entry in RFC 944 (Official ARPA Internet Protocols) for TCP, "there is no TCP 'probe' mechanism, [i.e., keepalives] and none is needed." I suggest that the best way to deal with this problem is to remove the KEEPALIVE options from all of the 4.2BSD network servers. This misfeature has caused considerable aggravation in our environment. Connections established from IBM PCs running the MIT PC/IP Telnet code get gratuitously dropped unless you type on them often enough. Apparently, the acknowledgment number contained in the 4.2 "probe" "takes back" a previous acknowledgment. The spec is not clear on this point, but in my opinion the PC code is perfectly entitled to ignore it completely, since it could only be an old duplicate. This is contrary to the assertion made in the 4.2BSD code: "Saying rcv_nxt-1 lies about what we have received, and by the protocol spec requires the correspondent TCP to respond." As described later, I believe this lie is also to blame for Steve's problem. What the code SHOULD do is to retransmit the last byte sent to the other end, exactly as if it had been lost in transmission and never acknowledged. This WOULD force the other end to respond. The code notes that there is a problem with a one-way data stream; however, in this case, you might try retransmitting your SYN (since it has a number in the sequence space) along with your Initial Sequence Number, and the other end OUGHT to respond with the desired acknowledgment (hopefully not with a RST). I have never seen the behavior Steve described on our systems, although this could be because the keepalive timer on our systems is quite a bit shorter than 15 minutes, and no machine can reboot this quickly. I'm not sure I fully understand his problem, but here goes. When the pollee comes back up, it has forgotten the sequence numbers it was using on the connection. It therefore attempts to formulate its RST response to the poller by using the ACK number contained in the poll (as dictated by the spec). Unfortunately, as I mentioned earlier, the poller is "taking back" an acknowledgement, so the sequence number contained in the RST will be outside the poller's acceptable window. Therefore it is ignored and Steve's problem occurs. So it seems that this is another problem caused by the poller lying in its ACK field about what it has received. Fix this and both Steve's problem and the dropped IBM PC Telnet connection problem should go away. I should point out here that there is a very good reason the spec says to ignore RSTs that lie outside the current window: if you didn't, an old duplicate RST could drop your connection unnecessarily. Therefore Steve's solution #A is unacceptable. Regarding solution #B, you must always acknowledge data, whether or not it lies inside your window (i.e., whether you have already seen it or not), because this could be a retransmission due to an earlier acknowledgment being lost. So probing with the last transmitted byte (assuming you DON'T lie about your acknowledgment number) ought to work. I like solution #C the best. Polling is worse than useless in virtually all situations. It causes even idle connections to get dropped if an intervening gateway goes down for a minute or so. I often have a half dozen idle rlogins going in different windows on my Sun workstation, and having to re-establish them all after somebody reboots a gateway is a real pain. I can understand losing the one I'm actively working on (although I wish the TCP giveup timers were longer), but breaking idle connections is unacceptable. I assume solution #D is a joke. Phil