Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!ucsd!ucsdhub!hp-sdd!ncr-sd!ncrlnk!uunet!mcvax!kth!enea!maxim!prc From: prc@maxim.ERBE.SE (Robert Claeson) Newsgroups: comp.databases Subject: Re: Strange problems with Informix Perform Keywords: Informix perform nohangup? Message-ID: <482@maxim.ERBE.SE> Date: 4 Feb 89 17:19:28 GMT References: <34891@codas.att.com> <4723@sfsup.UUCP> Organization: ERBE DATA AB Lines: 41 In article <4723@sfsup.UUCP>, jdt@sfsup.UUCP (J Tais) writes: > In article <34891@codas.att.com>, jaa@codas.att.com (James Anderson) writes: > > A user brings up the perform screen, enters the Add or Update mode. > > Something happens to the dialin port they are on and they get disconnected > > from the system. The Informix process starts to run wild grabbing as much > > usr and sys tics it can get (can tell by reading sar). > > a Who shows the user still logged in with no idle time. A stat on the tty > > usualy shows 0 idle read time and a write idle time (sometimes no write idle). > I remember a similar problem on my last project, but it happened when users > were rlogin'ed over TCP/IP and running perform on the remote machine. We > had a persistent problem with rogue perform processes grabbing all kinds of > cpu time when users disconnected in abnormal ways or were terminated by the > idle-line watcher. I've seen this behaviour in much too many software packages -- Informix and Oracle is just a few of them. What I think happens is that these packages ignores SIGHUP and relies on the return code from the write() and read() system calls to determine when a user has been disconnected. On many machines, the return code is 0 when the disconnect occurs on a dialup port (meaning "0 characters read/written") and -1 when a network connection is disconnected (meaning "error"; errno is set to some reasonable value). So these packages examines the return code, sees a 0 or a -1 and the program logic decides "heck, sumthin' went wrong, let's try it again". And off we go. Some packages interprets the 0 return code as a hangup indication, while they thinks that -1 is some kind of error and the fix is to retry the operation until it succeeds. Note that I don't say that this is what happens in all packages. I just happens to know that this is the way it happens in some packages. In fact, I haven't got the faintest idea about what Oracle and Informix does. I've just seen it happen to both of them, but in Oracle's case only when the disconnect occured on a TELNET/rlogin connection. -- Robert Claeson, ERBE DATA AB, P.O. Box 77, S-175 22 Jarfalla, Sweden "No problems." -- Alf Tel: +46 758-202 50 EUnet: rclaeson@ERBE.SE uucp: uunet!erbe.se!rclaeson Fax: +46 758-197 20 Internet: rclaeson@ERBE.SE BITNET: rclaeson@ERBE.SE