Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!cs.uoregon.edu!ogicse!intelhf!ichips!iwarp.intel.com!gargoyle!chinet!les
From: les@chinet.chi.il.us (Leslie Mikesell)
Newsgroups: comp.mail.misc
Subject: Re: Will ELM ever use lockf()?
Message-ID: <1991Apr11.181909.13503@chinet.chi.il.us>
Date: 11 Apr 91 18:19:09 GMT
References: <2800BA22.1CAE@tct.com> <1991Apr09.160512.1300@chinet.chi.il.us> <1991Apr11.024849.29924@Veritas.COM>
Organization: Chinet - Chicago Public Access UNIX
Lines: 70

In article <1991Apr11.024849.29924@Veritas.COM> tron@Veritas.COM (Ronald S. Karr) writes:
>>...
>>Failure scenario:
>>  Process A owns lockfile - Processes B & C are contending for one.
>>  B reads A's PID from lockfile.
>>  A finishes, removes lockfile and exits.
>>  B sends signal 0 to A's PID, notes that process is gone.
>>  C notes that no lockfile is present and creates one.
>>  B removes lockfile (now belonging to C) and creates one.
>>  At this point both B and C think they have exclusive access to
>>  the mailbox.

>However, both smail3 and elm create the files using the O_EXCL flag,
>indicating that the OS should ensure that only one of the two creates
>should succeed.

>Thus, failure of the locking mechanism in the way that you describe
>requires that the underlying operating system fail to obey the O_EXCL
>semantics.

No, look at the scenario again.  The real problem is that the process
that decides a lockfile is stale has no way to tell that the file
that it unlink()'s is the same one that it tested.  In the scenario
above, A has removed its own lockfile (which wasn't really stale but
looked that way because the process exited before B's signal arrived)
before C creates its lockfile with no apparent conflict.  Then
B's unlink() removes the one that C created, so O_EXCL doesn't come
into play at all.  There are lots of other ways this can happen but
I suspect this is the most likely.

>An important case to point out is that NFS does not obey the O_EXCL
>semantics between two different client machines.  As such, mailbox locking
>is broken on machines that do not use lockf(), which includes all sun
>machines as far as I know.  The only solution is to either change smail
>and all of the mail readers to use lockf(), or to never send and receive
>mail from two clients sharing the same NFS /usr/mail directory.

Personally, I think the "right" solution is to deliver new mail into
individual files per message with a naming convention so incomplete
temporary files could be ignored by the mail reader.  There would be
several other advantages as well:
 You could require the destination directory to already exist, which
 would allow you to detect an unmounted NFS/RFS mount point and defer
 delivery instead of writing in a directory that will become hidden
 when the net comes back up.
 You could deliver to multiple recipients on the same machine with
 a simple link.  This would make mailing lists about as cheap as
 a newsgroup and easier to maintain since the disk space would
 automatically be released when the last recipient's copy is deleted.
This isn't likely to happen, of course, since the changes to the
user agents and the transports would have to be syncronized, but it
would not be too difficult to add this support to either one.  Ideally,
the reader would consolidate the new messages into some other format
if they are to be stored after reading.

>Of course, the reason people almost never
>see the problem is that the window is extremely small given that an
>average mailbox receives only a few messages (up to 100) per day, and
>given that the window of vulnerability is typically sub-second.

For the scenario I mentioned, you need at least three processes, so
using a delivery mode where the foreground process just queues the
file and a single daemon process handles all deliveries should make
it safe where O_EXCL works. But, since the removal of lockfile is
never safe I think it is worth testing its age anyway.  Unless something
has gone drastically wrong you should never need to remove another
process' lockfile.   

Les Mikesell
  les@chinet.chi.il.us