Path: utzoo!attcan!uunet!lll-winken!ames!think!barmar From: barmar@think.COM (Barry Margolin) Newsgroups: comp.protocols.tcp-ip Subject: Re: SO_KEEPALIVE considered harmful? Message-ID: <20761@news.Think.COM> Date: 25 May 89 16:32:31 GMT References: <8905250638.AA21706@ucbvax.Berkeley.EDU> Sender: news@Think.COM Reply-To: barmar@kulla.think.com.UUCP (Barry Margolin) Organization: Thinking Machines Corporation, Cambridge, MA Lines: 49 In article <8905250638.AA21706@ucbvax.Berkeley.EDU> dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) writes: >It is worth adding that the excessive use of keepalives has removed a >feature that used to be in TCP and has been recently re-documented by >Bob Braden: TCP used to be remarkably robust against temporary >outages. If you were willing to wait, so was TCP. Now, an outage of >a very short time -- on some implementations, as short as 1-2 minutes -- >will abort the connection. I dispute this claim. TCP is only robust against temporary outages if you don't try to use the connection during that period. For instance, if I'm using telnet, the connection will stay alive during outages if I don't type anything to the client or the host doesn't try to send any output. If either end tries to use the connection, and the outage is longer than the TCP acknowledgement timeout, then the connection will die. If I happen to know that the network is having trouble I won't type anything, but how often is this the case? What it mostly means is that a temporary outage after I go home won't break my connections. TCP's robustness is still a good idea. It's nice to be able to swap Ethernet cables without causing all the network connections to die. But in my experience (which, I admit, isn't all that extensive), any connection that dies for more than a minute or two probably isn't going to come back. What I mostly care about, though, is that the other end definitely has reinitialized, e.g. it has crashed and been rebooted. If it's a telnet server that crashed I can do this by typing into the client, which will provoke a reset, and the client will abort. But if it's the telnet client or an X server that died, there's often no way to force the other end to try to send something so it will get a reset. I think the right solution is a compromise. What's needed is a way to send a segment with infinite (or near-infinite, e.g. hours or a day) retransmissions and slow retransmit rate (one to two minutes). This would allow idle connections to stay up across most network failures, but they will die within a minute or so of the other end rebooting. And, of course, it should be optional, so that applications that perform frequent output of their own need not compound their network use (although since keepalives need only be sent when there are no normal packets in the retransmit queue, any application whose output rate is higher than the keepalive rate will never invoke the keepalive mechanism). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar