Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!seismo!brl-tgr!gwyn
From: gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>)
Newsgroups: net.unix-wizards
Subject: Re: Extended file system / File locking on networks
Message-ID: <1106@brl-tgr.ARPA>
Date: Mon, 30-Dec-85 10:16:21 EST
Article-I.D.: brl-tgr.1106
Posted: Mon Dec 30 10:16:21 1985
Date-Received: Wed, 1-Jan-86 00:42:33 EST
References: <910@brl-tgr.ARPA> <2adcce15.1de6@apollo.uucp> <1011@brl-tgr.ARPA> <371@l5.uucp>
Organization: Ballistic Research Lab
Lines: 43

> Note that 4.2BSD also has file locking support, and that it doesn't work
> on NFS, and that so few programs break because of this that it's not
> worth mentioning.  How many things really use Sys V file locking?

If I had it, I'd sure use it.  (RECORD locking more than FILE locking.)
Living on a 4.2BSD kernel, I have no record locking at all (my System V
emulation obviously can't compensate adequately for this lack).  Since
record locking is a recent addition, it is not surprising that previously
existing utilities don't use it; that proves nothing about its possible
future importance.

> Note also that a serious file locking mechanism on a network must provide
> a way for a user program to be notified that the system has broken its lock.
> This situation occurs when a process locks a file on another machine, 
> and a comm link between the two machines goes down.  You clearly can't
> keep your database down for hours while AT&T (grin) puts your long line
> back in service, so the lock arbiter reluctantly breaks the lock.  (It
> can't tell if your machine crashed or whether it was just a comm
> line failure anyway.)  Now everybody can get at the file OK, but when the
> comm link comes back up, the process will think it owns the lock and
> will muck with the file.  So far nobody has designed a mechanism to tell
> the process that this has happened, which means to be safe the system must
> kill -9 any such process when this happens (e.g. it must make it *look*
> like the system or process really did crash, even though it was just a
> comm link failure).  I'm not sure how you even *detect* this situation
> though.

I don't see a big problem.  There are three possible cases of failure:
(1)  System owning the data crashes.  In this case, the remote process
will soon peform an I/O on the locked record/file (if it doesn't, you
have a problem even on a single system) which will fail (should return
EIO; could generate a signal instead, I suppose).  The regular failure
recovery should suffice (involves freeing locks, perhaps as a side-effect
of closing the file descriptor, perhaps automatically upon I/O error).
(2)  Communication link crashes.  (3)  Remote system crashes after
planting a lock.  Cases (2) and (3) are the interesting ones, but they
can be easily handled by simply pinging the locking system when a lock
conflict occurs.  (Various strategies could be used to reduce pinging
frequency, if desired, but I don't think it would be necessary.)  If the
locker denies knowledge of the lock, then void it locally and proceed.

The above approach probably doesn't work on stateless remote file
systems such as NFS, but this started out as a general RFS discussion.