Path: utzoo!utgpu!watserv1!watmath!att!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!hubcap!gatech!purdue!mentor.cc.purdue.edu!noose.ecn.purdue.edu!en.ecn.purdue.edu!milton From: milton@en.ecn.purdue.edu (Milton D Miller) Newsgroups: comp.windows.x Subject: Re: R5 wish list -- client recovery Summary: keepalives, if done right are ok; see also the discussion in comp.protocols.tcp-ip Keywords: Keep Alives, TCP Message-ID: <1990Nov20.001707.11343@en.ecn.purdue.edu> Date: 20 Nov 90 00:17:07 GMT References: <9011191810.AA01979@milton.u.washington.edu> Organization: Purdue University Engineering Computer Network Lines: 69 In article <9011191810.AA01979@milton.u.washington.edu>, donn@MILTON.U.WASHINGTON.EDU (Donn Cave) writes: >excerpts from <6220@lanl.gov> (Dale Carstensen): > >> ... with X11R3 and X11R4 ... if the server dies ... remote clients that >> had been connected to it continue to run, usually. > >Even if you wanted to use xdm, it isn't a complete cure for this one, since >it only gets the clients directly descended from it - not clients started from >the shell command line, not clients running on other hosts. > >We were able to install a fairly trivial patch to Xlib, so that the socket >is created with a keep-alive option. ... >Unfortunately, it seems to slightly aggravate Problem 2: > >> On the other hand, unreliability in the network connection between client >> and server can terminate clients while the server is still running. > >We get a lot of "Network down" (E_NETDOWN) errors, particularly on one host >whose Ethernet hardware is not the industry's best. .... There is currently a discussion in comp.protocols.tcp-ip about (not using) keepalives. As was pointed out there today: > From: barmar@think.com (Barry Margolin) > Subject: Re: Warning: Keep-Alive considered harmful [excerpt follows:] ... The connection shouldn't be killed as a result of keep-alive timeouts. Instead, the purpose of keep-alives should be to elicit RSTs from the other host. Timeouts can be due to any number of reasons, but a RST indicates unambiguously that the connection is unusable, because the other end rebooted or closed the connection itself (perhaps network problems prevented the FIN from getting through). If a host crashes, the keepalive won't actually notice this until it comes back up, which is probably good enough. [end of excerpt] I agree :-) Also notice, in the case of X terminals, there is usually a "close all connections" option, which is essentially a reboot of the tcp. The other end probably is not given FIN or RST, and the condition won't show up until the other end is poked. (For xterms, invoking "write" to youself will usually push the connection into destruction, and may return a "Not logged on there" to your write command). >I'm told that >ftp and telnet re-try when they encounter these conditions - would such >re-trying be another trivial modification to Xlib? It is the responsibility of TCP to do the retrying. It *Should* be up to the application when to give up (See also Host Requirements RFC), but that is not generally available :-(. >Are there fundamental >reasons why X should give up immediately when it encounters this network error? > None that I can think of; the servers already buffer for each client, I don't see why the clients shouldn't buffer for the server when not explicitly requesting a sync (do they do this already?) They (clients) may need to give up of the buffering is taking too much space; and response time may suffer with no apperent reason (to the other servers/users) if multiple servers are open. milton