Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!voder!pyramid!nsc!icldata!altos86!elxsi!beatnix!mre
From: mre@beatnix.UUCP (Mike Eisler)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Summary: considered useless
Message-ID: <2681@elxsi.UUCP>
Date: 24 May 89 19:37:39 GMT
References: <8905231205.AA00500@expire.lcs.mit.edu>
Sender: news@elxsi.UUCP
Reply-To: mre@beatnix.UUCP (Mike Eisler)
Organization: ELXSI Super Computers, San Jose
Lines: 46
Followup-To:

In article <8905231205.AA00500@expire.lcs.mit.edu> rws@EXPO.LCS.MIT.EDU writes:
>I have a random question that I hope this illustrious audience can answer
>definitively for me (or else point me to a definitive source).  Is the BSD
>notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to
>the TCP specification?  If so, is its use to be encouraged?  Specifically,
>it has been suggested that in the X Window System world, X libraries
>should automatically be setting SO_KEEPALIVE on connections to X servers.

When we brought up X on our BSD systems we tested it against a Visual Graphics
640 X-term. xterm was set up to spawned by init. When the Visual was powered
off during a connection a new x-term wouldn't get respawned. Analysis
of the BSD client showed the old x-term connection intact, and the xterm
process waiting for a message from the Visual which it would never get. We
figured KEEP alives would solve the problem and put them into the X
library. We found that this cured the problem when the Visual was powered
off for a long time; the KEEP alives eventually timed out waiting for a
response.

But for a quick power-off/power-on, KEEPs didn't help. KEEPs are
implemented as 1 byte segments countaining rcv_next-1,snd_una-1 as the
ACK and SEQ number values (i.e., a 1 byte segment that the segment's
receiver has already acknowledged, containing an ACK sequence # for a
byte that the segment's sender has already received).  The Visual is
listening for a X connection, and as expected responds with a 0 byte
reset, using rcv_next-1 as the SEQ number value.  After getting the
reset, BSD resets the KEEP alive timer because it has "proof" that the
connection is no longer idle. BSD then proceeds to follow instructions
of section 9.2.15.2 "Reset processing" in MIL-STD-1778 (12 Aug 83):

	" ... A reset is valid if its sequence number is in the connection's
	receive window. ... "

Well rcv_next-1 is not in the xterm client's window, so the reset is
tossed, *after* the KEEP timer was reset. So the BSD client sends
another KEEP a few seconds later and the process repeasts itself.  So
we don't get a connection reset, and we don't even get a connection
timeout as a consolation prize.  I suppose we could have "fixed" the
BSD code to not reset the KEEP timer on resets, but we wanted to have
something that would work in the field on existing versions of our
O/S.  We hacked xterm to send send the NOP request of the X protocol to
the server every so often and this has the desired effect (I'm putting
on my asbestos suit now...) of getting the immediate reset from the
Visual, *within* the client's window. The KEEP alive feature doesn't
seem that well thought out. Nor does server crash recovery seem well
thought out in X.
	-Mike Eisler (uunet,sun}!elxsi!mre