Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!seismo!brl-tgr!gwyn From: gwyn@brl-tgr.ARPA (Doug Gwyn ) Newsgroups: net.unix-wizards Subject: Re: Extended file system / File locking on networks Message-ID: <1106@brl-tgr.ARPA> Date: Mon, 30-Dec-85 10:16:21 EST Article-I.D.: brl-tgr.1106 Posted: Mon Dec 30 10:16:21 1985 Date-Received: Wed, 1-Jan-86 00:42:33 EST References: <910@brl-tgr.ARPA> <2adcce15.1de6@apollo.uucp> <1011@brl-tgr.ARPA> <371@l5.uucp> Organization: Ballistic Research Lab Lines: 43 > Note that 4.2BSD also has file locking support, and that it doesn't work > on NFS, and that so few programs break because of this that it's not > worth mentioning. How many things really use Sys V file locking? If I had it, I'd sure use it. (RECORD locking more than FILE locking.) Living on a 4.2BSD kernel, I have no record locking at all (my System V emulation obviously can't compensate adequately for this lack). Since record locking is a recent addition, it is not surprising that previously existing utilities don't use it; that proves nothing about its possible future importance. > Note also that a serious file locking mechanism on a network must provide > a way for a user program to be notified that the system has broken its lock. > This situation occurs when a process locks a file on another machine, > and a comm link between the two machines goes down. You clearly can't > keep your database down for hours while AT&T (grin) puts your long line > back in service, so the lock arbiter reluctantly breaks the lock. (It > can't tell if your machine crashed or whether it was just a comm > line failure anyway.) Now everybody can get at the file OK, but when the > comm link comes back up, the process will think it owns the lock and > will muck with the file. So far nobody has designed a mechanism to tell > the process that this has happened, which means to be safe the system must > kill -9 any such process when this happens (e.g. it must make it *look* > like the system or process really did crash, even though it was just a > comm link failure). I'm not sure how you even *detect* this situation > though. I don't see a big problem. There are three possible cases of failure: (1) System owning the data crashes. In this case, the remote process will soon peform an I/O on the locked record/file (if it doesn't, you have a problem even on a single system) which will fail (should return EIO; could generate a signal instead, I suppose). The regular failure recovery should suffice (involves freeing locks, perhaps as a side-effect of closing the file descriptor, perhaps automatically upon I/O error). (2) Communication link crashes. (3) Remote system crashes after planting a lock. Cases (2) and (3) are the interesting ones, but they can be easily handled by simply pinging the locking system when a lock conflict occurs. (Various strategies could be used to reduce pinging frequency, if desired, but I don't think it would be necessary.) If the locker denies knowledge of the lock, then void it locally and proceed. The above approach probably doesn't work on stateless remote file systems such as NFS, but this started out as a general RFS discussion.