Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!dsl.pitt.edu!pitt!amanue!oglvee!jr From: jr@oglvee.UUCP (Jim Rosenberg) Newsgroups: comp.unix.wizards Subject: Is HDB locking safe? Message-ID: <577@oglvee.UUCP> Date: 6 Aug 90 15:18:53 GMT Organization: Oglevee Computer Systems, Connellsville, Pa Lines: 59 I have a program which I need to be mutually exclusive with uuxqt. I've employed what I thought was pretty straight forward HDB locking using a lock file, but wasn't sure how to handle one particular problem, & now I don't see how HDB can handle it correctly either. HDB assumes that if the pid recorded in the lock file no longer corresponds to an active process, the lock file is defunct and can safely be removed. I can't for the life of me figure out a safe way of doing this. You can tell if there's an active process for the pid by giving it a kill() with a signal number of 0. Now suppose you get back ESRCH for errno and conclude that the process holding the lock is no longer active. What do you do? "Elementary, my dear Watson, you remove the lock file!" *** NOT SO FAST ***, Holmes. To unlink the lock file, the only thing you can supply to a system call is the *name* of the file. There is no way (so far as I know) to unlink by i-number. There's a narrow window in which another process may be doing exactly the same thing. You have no guarantee that the LCK.X file you just unlinked is in fact the same inode as the one from which you read the pid that you concluded is no longer active. Here's an example. This code is taken from pcomm 1.1, which is hideously out of date, but I had it lying around; it's a good example of some code written by somebody who took some care and *thought* he was doing the right thing: static int checklock(lockfile) char *lockfile; { ... if ((lfd = open(lockfile, 0)) < 0) return(0); ... if ((kill(lckpid, 0) == -1) && (errno == ESRCH)) { /* * If the kill was unsuccessful due to an ESRCH error, * that means the process is no longer active and the * lock file can be safely removed. */ unlink(lockfile); sleep(1); return(1); } In this code there is no guarantee that lfd and lockfile correspond to the same file at the time of the unlink. I've wracked my brains trying to think of a safe way to do this, and can't think of one. How does HDB do it?? Is HDB lock file handling *in fact vulnerable* to this narrow window problem? One thing I thought of was to link the lockfile to a temp file, stat the temp file before the unlink, stat it again afterwards; if the link count fails to go down you know you made a beeg booboo and nuked an active lock file. But then what? You can't put the lock file back -- you don't have a link to it. Help! -- Jim Rosenberg #include --cgh!amanue!oglvee!jr Oglevee Computer Systems / / 151 Oglevee Lane, Connellsville, PA 15425 pitt! ditka! INTERNET: cgh!amanue!oglvee!jr@dsi.com / /