Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!ncar!mephisto!mcnc!rti!trt From: trt@rti.rti.org (Thomas Truscott) Newsgroups: comp.unix.wizards Subject: Re: Is HDB locking safe? Summary: Yes, HDB locking is safe! Message-ID: <4024@rtifs1.UUCP> Date: 15 Aug 90 19:46:12 GMT References: <577@oglvee.UUCP> Organization: Research Triangle Institute, RTP, NC Lines: 62 > ... HDB assumes that if the pid recorded > in the lock file no longer corresponds to an active process, the lock file is > defunct and can safely be removed. I can't for the life of me figure out a > safe way of doing this. A crucial detail in recovering from a breakdown in the lock protocol is avoiding a race between two or more processes that are simultaneously attempting recovery. Usually a strategic pause is all that is needed, and as you can see in the HDB code below there is just such a pause. > static int > checklock(lockfile) > char *lockfile; > { > ... > if ((lfd = open(lockfile, 0)) < 0) > return(0); > ... > if ((kill(lckpid, 0) == -1) && (errno == ESRCH)) { > /* > * If the kill was unsuccessful due to an ESRCH error, > * that means the process is no longer active and the > * lock file can be safely removed. > */ > unlink(lockfile); > sleep(5); /* avoid a possible race */ > return(1); > } > > In this code there is no guarantee that lfd and lockfile correspond to the > same file at the time of the unlink. But there *is* a guarantee -- the "sleep(5);"!! [I changed the sleep() line to match the one in 4.3 BSD uucp "ulockf.c"] Consider a process "X" that discovers that the locking process has terminated. X unlinks the lockfile, but then it *pauses* before it attempts to grab the lock for itself (done by code not shown above). Now consider scenario #1 for another process "Y": At nearly the same instant Y discovers the dead lock, so it also unlinks the lockfile (of course only one unlink can succeed) and it *also pauses*. Whenever X and/or Y resume there is no lock present, so attempts to grab it proceed in the usual way (code not shown above). Now consider scenario #2 for Y: Just after X has unlinked the lockfile, Y calls checklock() and discovers no lock is present. No problem, it just attempts to grab the lock in the usual way (code not shown above). When X awakes from its slumber it will discover that Y has already grabbed the lock, so X will just have to wait. The HDB code is nice, but does have flaws: (a) A "sleep(1);" is not enough to avoid a race on a very busy system. (b) Lock recovery is obscure, so the sleep() call should be commented. (c) Protocol breakdown is a bad thing, and should be reported: logent(lockfile, "DEAD LOCK"); The 4.3 BSD ulockf.c routine has all of these nice features. Tom Truscott