Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!bu.edu!att!pacbell.com!ames!pasteur!ucbvax!LCS.MIT.EDU!MAP From: MAP@LCS.MIT.EDU (Michael A. Patton) Newsgroups: comp.protocols.tcp-ip Subject: Warning: Keep-Alive considered harmful Message-ID: <9011170344.AA20268@gaak.LCS.MIT.EDU> Date: 17 Nov 90 03:44:04 GMT References: <1990Nov16.164448.9918@bwdls61.bnr.ca> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 74 Warning: Keep-Alive considered harmful!!! Date: 16 Nov 90 16:44:48 GMT From: van-bc!ubc-cs!news-server.csri.toronto.edu!utgpu!cunews!bnrgate!bwdls61.bnr.ca!usenet@ucbvax.berkeley.edu (Peter Whittaker) (Boy, what a bogus address. Good thing you put one in your signature. You may not be able to help the enormous length, but could you try and get your mail service to put YOU, not your daemon, in the header?) So, are there in fact substantive reasons not to use SO_KEEPALIVE? Yes, indeed there are. The implementation of keep-alives in the TCP level is just WRONG. Although the actual details are (usually) not a strict violation of the spec, they do stretch it in interesting ways and I have seen at least one implementation that would sometimes RESET a connection in response to a keep-alive packet. If you want the functionality that you think you get automagically by setting this option, you should really think about the specifics more. Think about what functionality you really want. Do you just want it to free up the socket or kill a server? Why do you care? Are you attacking a symptom rather than the real problem? Usually, the application needs much better control over how it really works than just setting an option that causes it to crash if something goes wrong. You should think about the many effects that can influence this (I'll list a few presently) and consider whether you want these in your application domain. Beware that you may only intend to run your application in a limited context, but eventually someone will try it over a different domain. Please consider the different cases before going for options like this. One thing to consider is that there are links in the world where you occasionally get only a packet every 5-10 minutes to actually go through. Do you want to forbid the use of your service over such a link? Just this morning I had a user complaint that they couldn't FTP a file between two distant hosts. The problem resolved to a link that dropped out for several minutes every half hour or so, but the transfer time for the file they wanted was 45 minutes. When the link dropped out, they would get punted because of keep-alives, then they had to start over. If only they weren't running keep-alives on the FTP Server it would have worked, the person doing the transfer had enough patience, if only the computer had. Another thing to consider is that in any application with a person in control, they are usually a better watch-dog timer than any program you build. This might suggest building some user-interface feature to help. The hash marks in some versions of FTP are one example. Or you might build the high-level timeout, but rather than punting the operation, merely print out a message to the user explaining how they could do it. There are a few other general things to watch out for when building any distributed system, these are a few related to the keep-alive question. Never predefine a timeout, it will eventually be too small. As an example I have an FTP script that had a global timeout on a transfer that caused it to quit and go on to the next. I discovered that for one combination of server and file, FIVE HOURS was not enough, upping it to 10 got that file transferred (but wreaked havoc with my other assumptions :-). Well, there are several more things I could bring up, but I seem to have rambled on for over a page already so I'll cut it short here. I hope this helps to answer your questions and give you some ideas as to why keep-alive is considered harmful. Hopefully it will point you in a useful direction for developing what you really need as well. __ /| /| /| \ Michael A. Patton, Network Manager / | / | /_|__/ Laboratory for Computer Science / |/ |/ |atton Massachusetts Institute of Technology Disclaimer: The opinions expressed above are a figment of the phosphor on your screen and do not represent the views of MIT, LCS, or MAP. :-)