Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!agate!apple!sun-barr!newstop!sun!terra!brent From: brent%terra@Sun.COM (Brent Callaghan) Newsgroups: comp.unix.wizards Subject: Re: NFS, hung processes Keywords: NFS,hang,process,server,client Message-ID: <118821@sun.Eng.Sun.COM> Date: 1 Aug 89 06:35:41 GMT References: <24D1DF49.7A5@marob.masa.com> <13134@bloom-beacon.MIT.EDU> Sender: news@sun.Eng.Sun.COM Lines: 28 In article <13134@bloom-beacon.MIT.EDU>, jik@athena.mit.edu (Jonathan I. Kamens) writes: > > One solution, which is what we use, is not to hard mount anything > but the most important NFS filesystems. We mount all user filesystems > soft with a five minute error timeout by default, so if a user's > fileserver goes down, processes will only try to access it for five > minutes. Once the user gets his prompt back, he can carefully save > whatever work he is doing to a local hard disk or mail it to himself > to prevent it from being lost. A problem with "soft" mounting is that a timed-out I/O will return an error result to the user program. Unix programs are notorious for not checking for error returns on read(), write() etc and can fail in mysterious ways. This can be particularly bad in the case of an executable that is running from a dead server. A pagein that gets an error from a soft mount will crash the process and leave a core dump. I prefer to mount /usr and local executables ("/usr/local" around here) with "hard" and set the "intr" option so that I can at least kill a hung process with a SIGTERM if I get fed up waiting. The "intr" should work OK - although it can take a while since it has to wait for the hung NFS operation to timeout (can take a minute or so). Made in New Zealand --> Brent Callaghan @ Sun Microsystems uucp: sun!bcallaghan phone: (415) 336 1051