Xref: utzoo comp.sys.hp:6106 comp.unix.internals:89 Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!snorkelwacker!ai-lab!zurich.ai.mit.edu!cph From: cph@zurich.ai.mit.edu (Chris Hanson) Newsgroups: comp.sys.hp,comp.unix.internals Subject: Re: hp-ux 7.0/800 select() strangeness? Message-ID: Date: 6 Sep 90 09:40:07 GMT References: <90242.151955MAH@awiwuw11.wu-wien.ac.at> Sender: news@ai.mit.edu Organization: M.I.T. Artificial Intelligence Lab. Lines: 57 In-reply-to: MAH@awiwuw11.wu-wien.ac.at's message of 30 Aug 90 15:16:05 GMT From: MAH@awiwuw11.wu-wien.ac.at (Michael Haberler) Date: 30 Aug 90 15:16:05 GMT I have encountered a strange behaviour of several programs which use select(2) on hp-ux 7.0 on the Series 800. All of these programs are 'ported' BSD code, so I have the suspicion there's something in common: It seems that programs which have select(2) in their inner loop sometimes start using enormous amounts of system cpu time, just as if the select() call would return immediately as if it were polling. Among those programs are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3. Xemacs tends to do this especially if the X server terminates before emacs. I did'nt find a explanation for named behaviour. With tn3270, it looks like a modem disconnect and thus eof on the tty would cause tn3270 looping. I managed to get emacs into that state last night, and debugged it. What happened was as follows. I normally run several subprocesses under emacs. At the time that the problem occurred, there were two active subprocesses, and two exited subprocesses. Emacs still had all four subprocesses in its tables. Emacs's command reader checks all of the subprocesses periodically for input, using the `select' call on the input file descriptors of the processes, and due to some peculiarities of its design, it was checking all four of the subprocesses, even though two of them no longer existed. This `select' call was returning with a single bit set, which indicated that the input file descriptor from one of the dead subprocesses had some input that could be read. Emacs then dutifully went into a `read' call on that descriptor, which fortunately was set to non-blocking mode, and the `read' call returned saying that of course there was no data. In summary: we have two processes and a pipe from one to the other. The read side of the pipe has been set to non-blocking mode by the use of O_NONBLOCK. The process on the write side of the pipe finishes by calling `exit'. The process on the read side receives SIGCHLD and uses `waitpid' to extract the exit status of the now-dead subprocess. It then does a `select' on the read side of the pipe, which returns indicating that the pipe has some data to be read. The process calls `read' on the pipe, which returns zero indicating no data is available. Etc. Now I'm no expert, but it's my belief that `select' shouldn't indicate that the pipe has input in this situation. For information: this behavior has been observed (by others) when the subprocess is using a PTY to communicate with emacs, although it has not been debugged and thoroughly examined in such a case. PS: Emacs is being changed so that it does not attempt to use `select' on connections to dead processes. Version 18.56 will not have this problem. If anyone is interested in a patch for 18.55, they should contact me directly by e-mail.