Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!rice!sun-spots-request From: teraida!mikel@decwrl.dec.com (Mikel Lechner) Newsgroups: comp.sys.sun Subject: Re: mounted machine down => df hangs Keywords: Networks Message-ID: <3492@brazos.Rice.edu> Date: 3 Dec 89 08:06:09 GMT Sender: root@rice.edu Organization: Sun-Spots Lines: 36 Approved: Sun-Spots@rice.edu X-Refs: Original: v8n180, Replies: v8n188 v8n193 v8n194 v8n206 X-Sun-Spots-Digest: Volume 8, Issue 213, message 7 of 12 kae@ihlpm.att.com (Kenneth A Edwards) writes: >In article <2652@brazos.Rice.edu> rush@xanadu.llnl.gov (Alan Edwards) writes: >> >>When one of our disk servers goes down, doing a 'df' on a machine that has >>the one of the disk server's partitions mounted, causes the 'df' process >>to hang PERMANENTLY. The df process cannot be killed by kill -9. Is >This is not likely to be fixed (and isn't fixed) in 4.0.3, since the >problem is inherent in the definition of how "hard" (the default mount) >NFS works. There are a couple of things you can do: Actually, there is a new bug introduced with release 4.0 with NFS mounted filesytems. Processes that are not accessing the downed system still also hang up waiting on the dead system. Under 3.5 and previous releases it was possible to work around this problem by mounting all NFS filesystems in a directory under root and with a separate directory for each server. For example: "/hosts/sun1/disk1" would be a mount point for an NFS filesystem under this scheme. Then with the library call "getwd()", if your process is in directory "/hosts/sun2/disk1", your process can safely step up the directory tree and not touch the dead NFS mount point. This worked just fine for us until release 4.0. With SunOS4.0 and later releases, Sun introduced a "performance improvement" to the "getwd()" library call. The library function ends up "stat()"ing virtually all your mounted filesystems every time your program tries to compute its working directory. This is nearly guaranteed to hang up any process makeing a "getwd()" call when a hard-mounted NFS filesystem hangs. IMHO a process that is not accessing data on a dead NFS server should not hang waiting on that server, but it does. Can we have slow "getwd()" call back? :^(. Mikel Lechner UUCP: mikel@teraida.UUCP Teradyne EDA, Inc.