Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!bloom-beacon!athena.mit.edu!jik From: jik@athena.mit.edu (Jonathan I. Kamens) Newsgroups: comp.unix.wizards Subject: Re: NFS, hung processes Keywords: NFS,hang,process,server,client Message-ID: <13134@bloom-beacon.MIT.EDU> Date: 30 Jul 89 22:25:02 GMT References: <24D1DF49.7A5@marob.masa.com> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: jik@athena.mit.edu (Jonathan I. Kamens) Organization: Massachusetts Institute of Technology Lines: 42 Query: any reason why this wasn't asked in comp.protocols.nfs? In article <24D1DF49.7A5@marob.masa.com> samperi@marob.masa.com (Dominick Samperi) writes: >There seems to be no obvious way to deal with the problem of hung processes >due to a dead NFS server. The recently posted 'cknfs' program might help >somewhat, but it does not deal with the situation where a server dies after >a user logs in. Furthermore, it appears that a process may still hang >even if no reference is made to dead NFS paths. I don't know why...perhaps >the crashed machine is flooding the network with garbage packets???? > >Perhaps some experienced NFS users could comment on various tricks that they >have used to deal with the "NFS hang problem"? Project Athena has well over 1,000 workstations, with over 10,000 user accounts, and every user gets his home directory over NFS. There are also a lot of third-party lockers that people use often that are exported via NFS. We therefore encounter this problem much more often than we'd like. The most common way of referencing a dead NFS path even if you don't realize you're doing it is if you have said path in your search path and try to execute a program and/or start a new shell. Both will cause the search path to be scanned, and they could encounter the dead path and hang on it. One solution, which is what we use, is not to hard mount anything but the most important NFS filesystems. We mount all user filesystems soft with a five minute error timeout by default, so if a user's fileserver goes down, processes will only try to access it for five minutes. Once the user gets his prompt back, he can carefully save whatever work he is doing to a local hard disk or mail it to himself to prevent it from being lost. The only filesystems we hard mount by default are the system software packs, since if they go down there isn't much you can do with the workstation anyway. Jonathan Kamens USnail: MIT Project Athena 432 S. Rose Blvd. jik@Athena.MIT.EDU Akron, OH 44320 Office: 617-253-4261 Home: 216-869-6432