Xref: utzoo comp.unix.questions:7063 comp.unix.wizards:8558 Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!sun!gorodish!guy From: guy@gorodish.Sun.COM (Guy Harris) Newsgroups: comp.unix.questions,comp.unix.wizards Subject: Re: Trouble killing processes Message-ID: <53579@sun.uucp> Date: 17 May 88 19:51:13 GMT References: <3950@killer.UUCP> <3951@killer.UUCP> <216@obie.UUCP> <52288@sun.uucp> <7117@swan.ulowell.edu> Sender: news@sun.uucp Lines: 30 > The real fix would of course be in the kernel. I would suggest setting > a timeout on each system call. This way, an lseek on a dead tape drive, > say, would fail after n secs of cpu. Some sort of context might need > to be saved before the syscall starts, so things can be restored. This > could be expensive. Comments? Probably not a good idea. "lseek" is a bad example; in all current UNIX systems that I'm familiar with, "lseek" only sets a "seek pointer" in memory - it never goes near the device. This pointer is then used by the driver to position the tape before doing any I/O operation. A more germane example *might* be an I/O operation or an "position the tape" "ioctl" operation on a dead tape drive, except that the *only* reason this would require a timeout should either be that the tape driver is buggy and doesn't immediately detect a dead drive or that it doesn't have some timeout scheme *in the driver* to detect a dead drive. Even such a timeout could be tricky; some magtape operations can take a *very* long time to complete. Basically, system calls should take as long as they need to; this could very well be infinite ("pause()" or "sigpause()") or, worse, finite but indeterminate. In either case, no timeout can be imposed. A typical "wedged" process is either waiting for something that *must* complete (in which case its unkillability is unfortunate but unavoidable) or is hung due to a kernel bug (in which case the real fix is, of course, in the kernel - but it's not to kludge in a timeout). (P.S. the timer obviously doesn't want to be based on CPU time - a blocked process tends to consume CPU time *extremely* slowly, if at all.)