Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!rice!sun-spots-request From: dan@bbn.com Newsgroups: comp.sys.sun Subject: Re: Lock Daemon, lockf fails Keywords: SunOS Message-ID: <4885@brazos.Rice.edu> Date: 9 Feb 90 19:05:03 GMT Sender: root@rice.edu Organization: Sun-Spots Lines: 38 Approved: Sun-Spots@rice.edu X-Refs: Original: v9n16 X-Sun-Spots-Digest: Volume 9, Issue 36, message 10 With regard to problems with Sun's locking daemon and network locking in general, here are some problems we've found: 1. Sun's lockd and statd (3.4) are awful in the presence of other machines implementing NFS but not lockd, such as Ultrix machines for Ultrix < 3.0. If you NFS-mount such a machine's filesystem onto a Sun, then try (on the Sun) to lock a file on the Ultrix machine, your process will hang forever, unkillable. To get around this we had to write a test preceding the locking call that sends an inquiry to the host holding the file to be locked; if we learn that no lockd or statd exists (you must check for both) then we use local locking on the file instead. 2. Other vendors are no better. The Ultrix 3.0 lockd sometimes pauses for 2 minutes when you first try to use it in a process. We are still tracking this one down, but it seems to depend on configuration issues like where you're getting your hostnames. A given configuration (i.e., /etc/svcorder, /etc/hosts, etc. and the up/down status of the other machines NFS-mounted to the one in question) will either always show this problem or never show it. A trace of the process shows it repeatedly sending a message to some other host and waiting 5 seconds for a response. 3. Another problem we have seen under Ultrix 3.0 is that it often takes several (non-blocking) fcntl calls before a file lock is granted. We're not sure what precipitates this behavior. We've seen this after locking and unlocking a file with one process: when we try to lock the same file through another process, several attempts are required. (To demonstrate this bug, run a test program that simply calls fcntl in a loop, reporting the number of iterations necessary to acquire a lock.) There are patches for this bug, once you realize what's going on. (On a DECstation, you should upgrade to 3.1 before applying the patches; they don't work so well under 3.0.) 4. It's worth pointing out a SunOS fcntl locking "feature" that may not be obvious: if you open() 2 file descriptors on a single file, fd1 and fd2, establish a lock on fd1 and then close(fd2), the lock established through fd1 is lost. Mark Sommer and Dan Franklin