Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site l5.uucp Path: utzoo!linus!decvax!ittatc!dcdwest!sdcsvax!ucbvax!ucdavis!lll-crg!l5!gnu From: gnu@l5.uucp (John Gilmore) Newsgroups: net.unix-wizards Subject: Re: Extended file system / File locking on networks Message-ID: <371@l5.uucp> Date: Mon, 30-Dec-85 05:34:01 EST Article-I.D.: l5.371 Posted: Mon Dec 30 05:34:01 1985 Date-Received: Tue, 31-Dec-85 00:59:29 EST References: <910@brl-tgr.ARPA> <2adcce15.1de6@apollo.uucp> <1011@brl-tgr.ARPA> Organization: Nebula Consultants in San Francisco Lines: 52 In article <1011@brl-tgr.ARPA>, gwyn@brl-tgr.ARPA (Doug Gwyn) writes: > AT&T's RFS, I was told, treats a network link going down the same > as it would a disk going off-line; there will be an error returned > from any subsequent attempt to do I/O to the inaccessible file. > The obvious alternative to I/O errors when a net link goes down is > to block processes doing remote file I/O over the link until it > comes back up; this is probably unwise for record locking systems. The Sun NFS provides both options when a link or machine goes down. If you have mounted the file system "hard", then it blocks I/O ops until it comes back. If you mount "soft", it retries a few times and then returns an error code. I tended to mount non-critical stuff soft, e.g. my net.sources archives, so in case I touched them while the server was down, I wouldn't hang with unkillable processes. For your root partition you tend to want a hard mount... > Note that full support for UNIX file system semantics is a crucial > issue for AT&T UNIX System V systems, which support record locking. Note that 4.2BSD also has file locking support, and that it doesn't work on NFS, and that so few programs break because of this that it's not worth mentioning. How many things really use Sys V file locking? Certainly not all the Unix utilities that remain unchanged since V7. Note also that a serious file locking mechanism on a network must provide a way for a user program to be notified that the system has broken its lock. This situation occurs when a process locks a file on another machine, and a comm link between the two machines goes down. You clearly can't keep your database down for hours while AT&T (grin) puts your long line back in service, so the lock arbiter reluctantly breaks the lock. (It can't tell if your machine crashed or whether it was just a comm line failure anyway.) Now everybody can get at the file OK, but when the comm link comes back up, the process will think it owns the lock and will muck with the file. So far nobody has designed a mechanism to tell the process that this has happened, which means to be safe the system must kill -9 any such process when this happens (e.g. it must make it *look* like the system or process really did crash, even though it was just a comm link failure). I'm not sure how you even *detect* this situation though. This never happened on single machines with file or record locking because when the kernel crashes, it takes all the user processes with it, so when it comes back up, they won't be around to munge the file. Sun (Jo-Mei Chang) is doing some research on how to have the lock manager know within 30 seconds or so that your host has gone down (so it can break the lock), but last time I heard, her scheme relied heavily on broadcast or multicast packets, and gets very inefficient as soon as you start doing serious traffic thru a gateway or a non-broadcast network. And even if they implemented the System V file locking standard using such a lock manager, that doesn't solve the above problem.