Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!voder!pyramid!nsc!icldata!altos86!elxsi!beatnix!mre From: mre@beatnix.UUCP (Mike Eisler) Newsgroups: comp.protocols.tcp-ip Subject: Re: SO_KEEPALIVE considered harmful? Summary: considered useless Message-ID: <2681@elxsi.UUCP> Date: 24 May 89 19:37:39 GMT References: <8905231205.AA00500@expire.lcs.mit.edu> Sender: news@elxsi.UUCP Reply-To: mre@beatnix.UUCP (Mike Eisler) Organization: ELXSI Super Computers, San Jose Lines: 46 Followup-To: In article <8905231205.AA00500@expire.lcs.mit.edu> rws@EXPO.LCS.MIT.EDU writes: >I have a random question that I hope this illustrious audience can answer >definitively for me (or else point me to a definitive source). Is the BSD >notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to >the TCP specification? If so, is its use to be encouraged? Specifically, >it has been suggested that in the X Window System world, X libraries >should automatically be setting SO_KEEPALIVE on connections to X servers. When we brought up X on our BSD systems we tested it against a Visual Graphics 640 X-term. xterm was set up to spawned by init. When the Visual was powered off during a connection a new x-term wouldn't get respawned. Analysis of the BSD client showed the old x-term connection intact, and the xterm process waiting for a message from the Visual which it would never get. We figured KEEP alives would solve the problem and put them into the X library. We found that this cured the problem when the Visual was powered off for a long time; the KEEP alives eventually timed out waiting for a response. But for a quick power-off/power-on, KEEPs didn't help. KEEPs are implemented as 1 byte segments countaining rcv_next-1,snd_una-1 as the ACK and SEQ number values (i.e., a 1 byte segment that the segment's receiver has already acknowledged, containing an ACK sequence # for a byte that the segment's sender has already received). The Visual is listening for a X connection, and as expected responds with a 0 byte reset, using rcv_next-1 as the SEQ number value. After getting the reset, BSD resets the KEEP alive timer because it has "proof" that the connection is no longer idle. BSD then proceeds to follow instructions of section 9.2.15.2 "Reset processing" in MIL-STD-1778 (12 Aug 83): " ... A reset is valid if its sequence number is in the connection's receive window. ... " Well rcv_next-1 is not in the xterm client's window, so the reset is tossed, *after* the KEEP timer was reset. So the BSD client sends another KEEP a few seconds later and the process repeasts itself. So we don't get a connection reset, and we don't even get a connection timeout as a consolation prize. I suppose we could have "fixed" the BSD code to not reset the KEEP timer on resets, but we wanted to have something that would work in the field on existing versions of our O/S. We hacked xterm to send send the NOP request of the X protocol to the server every so often and this has the desired effect (I'm putting on my asbestos suit now...) of getting the immediate reset from the Visual, *within* the client's window. The KEEP alive feature doesn't seem that well thought out. Nor does server crash recovery seem well thought out in X. -Mike Eisler (uunet,sun}!elxsi!mre