Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!psuvax1!rutgers!mit-eddie!xn.ll.mit.edu!rkc From: rkc@xn.ll.mit.edu Newsgroups: comp.unix.wizards Subject: file locking issues, NFS, lockf Message-ID: <1991Apr30.192117.4730@xn.ll.mit.edu> Date: 30 Apr 91 19:21:17 GMT Organization: MIT Lincoln Laboratory Lines: 70 =This is a slight modification of a posting that occured in comp.sys.sun. =I received only a few answers which seemed to open as many questions as they =answered. I now call upon the unix wizards to help me out. I have written an application that is similar to a network database application in which data is stored in on NFS-accessable file. To protect from multiple simultaneous updates, I have used the lockf subroutine to lock the entire file. I have had numerous problems with the lockf routine "locking up". The symptoms vary: S1. The client dies and the server doesn't realize it. In order to avoid processes being killed when they own the lock, I catch the following signals: signal( SIGHUP, clnp ); signal( SIGQUIT, clnp ); signal( SIGINT, clnp ); signal( SIGILL, clnp ); signal( SIGIOT, clnp ); signal( SIGEMT, clnp ); signal( SIGFPE, clnp ); signal( SIGBUS, clnp ); signal( SIGSEGV, clnp ); signal( SIGSYS, clnp ); signal( SIGTERM, clnp ); Should I catch more? FYI, Here's what the lock code looks like: for(NumAttempts = 0;NumAttempts <= NUMPOLLS ; NumAttempts++){ if( lockf( fd, F_TLOCK, 0L ) != (-1)) { success = TRUE; break; } sleep(2); } I avoid the indefinate wait lock because this appears to increase the probability that an error will occur. S2. Sometimes the client doesn't die--it just hangs. Attaching the hung program indicates something hangs inside of fcntl. S3. Occasionally, I get messages like unknown klm_reply proc(0) unknown klm_reply proc(40) Does anyone have any idea where these come from? Other questions include: 1. Is there any known way to unconfuse our machines and reset state without rebooting the things? Killing statd and lockd is not sufficient. 2. I was once told that sun released patches to their lock daemon, but noone could direct me to them. Does a wizard know where such things exist? 3. If lockf cannot be made to work, would I be at risk using the old technique of creating a "lock directory"? I've read that with NFS this won't work, but I've never read a good explanation of the problems with this approach. Are their other workarounds (semaphores, etc) that I should try? I would prefer to get this to work properly using lockf, since this seems to be exactly what lockf is designed for. Our network consists of sparcstation 1+ and IPC's running either 4.0.1, 4.1 or 4.1.1, and sun3's running 4.0.3. In the near future we will also be using DG's aviion/UX workstations. Thanks for any help you can provide, -Rob