Path: utzoo!attcan!uunet!decwrl!sgi!vjs@rhyolite.wpd.sgi.com
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
Newsgroups: comp.protocols.nfs
Subject: Re: NFS writes and fsync().
Message-ID: <72791@sgi.sgi.com>
Date: 21 Oct 90 08:33:56 GMT
References: <1990Oct9.152612@objy.objy.com> <thurlow.655748135@convex.convex.com> <143983@sun.Eng.Sun.COM>
Sender: guest@sgi.sgi.com
Organization: Silicon Graphics, Inc., Mountain View, CA
Lines: 124


In article beepy@ennoyab.Eng.Sun.COM (Brian Pawlowski) writes:
> ...

We agree:
  -"NFS is stateless" is not a technical statement, but describes the
    general philosophy used by the NFS designers.
  -NFS!=UFS.
  -the "NFS server cache dogma" increases the reliability of the system,
    and is consistent with the stateless philosophy.


>                           ...           The need for an "XID cache"
> addresses a "bug" in the protocol.

I may disagree.  The external behavior produced by an XID cache should have
been specified in the beginning.  It is required by real world networks.

>                                    Suggestions to the effect of
> eliminating syncing data to stable storage on a server before returning
> NFS_OK on a write undermines basic assumptions made by clients.

No, the synchronous server data cache is an implementation of "safe" or
high MBTF server data storage.  What if I build a server with non-volatile
RAM (e.g. a 180-day UPS on a Sun), put cache blocks in a reserved part of
RAM, have the bootstrap code for UNIX and the diagnostics discover and
preserve all valid cache blocks, and operate in the evil async mode?
(Similarities to Prestoserve are unintended and inevitible.)  If
Prestoserve is OK, then so is this, or any other with the same MTBF.

> There are a lot of interesting "state" thingies agreed to by the
> clients and servers. File handles are agreed to "persist" over a
> crash.

Good point.

> [Vernon: Do you feel like posting an enumerated prioritized list of
> missing features in NFS....

The UNIX file system is not holy.  The NFS lacks are irrelevant, except
where they are needed by users.   NFS needs about 6 things, including some
kind of cache operation like that discussed recently, some open-unlink
support, and a few others that our local NFS Master chants under his
breath.  The new protocol had almost all of them 2 years ago.  It's too bad
it went non-linear.

>    ... I would postulate that most server crashes don't result
> in lost disks, and that ...[sync writes work]...

Yes, synchronous operation is a good BruteForceAndIgnorance implementation
of what should be the protocol requirement.  (I like BF&I--on the first cut.)

> Have you or anyone ever seen NFS servers with "intelligent" caching
> disk controllers create a "loss of data" problem?

Good point.  I've heard rumors, but not seen anything.   (We limit
controller caches--sync. writes wait.)

> At this point I'm wondering if you are advocating throwing away
> the "requirement" for a server to flush to "stable" storage? Are you?

Yes, the stated requirement is bogus.  Pick whatever MTBF or equivalent
you wish, but please stop using an implementation to describe a protocol.
(Yes, sometimes an implemenation is the best spec, but only if you can
and do say which characteristics you really care about.)

> ...
> You firmly believe this "flush to stable storage" requirement is in 
> the realm of dogma?

Yes.  It is a taboo or folk medicine like quinine water or willow bark.  It
is ok, if we have rationally chosen it instead of its equivalents, or if we
don't understand it.  Since we all understand it, let's replace the taboo
with engineering requirements.

Actually, "flush to stable storage" would be ok, if people would not
keep reading it as "call bwrite()," and if it were quantitative.

> ...
> The protocol specification dictates pretty straightforward
> external characteristics.

The protocol dictates many external characteristics in terms of an
implementation.  The only complete protocol spec comes on the Sun reference
tapes.  Still, I much prefer the NFS protocol spec-tape to the ANSI/IEEE
paper swill I've been fighting lately.

> > I wonder if it was not mostly a statement about the lack of reliability of
> > NFS servers of the time (i.e. 68010 UNIX systems in 1984).
> 
> Is your basis simply then that today servers are more reliable, and that
> in practice this is not a problem? Is server reliability the critical
> factor or are external factors like power outages, errant flipping
> of power switches, etc. significant? I would assume that disk MTBF's
> were much greater than server MTBF's, and synchronous writes exploit
> this.

Yes, careful operators, solid hardware, a UPS, and sufficently bug free
software are more important and effective than synchronous writes ever
were.  People do more damage to files with keyboards than with power
switches.  The standard lightening drill has always been to hit the switch
to keep power off to protect disks from flickers and surges.  The servers I
use have reasonably balanced MTBF's.  The big NFS servers I know about (all
source for everything from forever on racks of GB drives) stay up for
months, and suffer disk problems as often as all others.

> > The NFS cache dogma does solve problems, but those problems are of people
> > selling things, not of people building or buying things.
> 
> Wow. Wow again. I'm thinking about what everyone is selling (including
> you). 

I was referring to "selling" as in "the marketing department," not selling
as in verbally counting coup.  I'm not selling because I won't get any $ if
I convince you--I might get less because your boxes would be better.

>     ...    this seems to be an increasingly polarizing issue.

In my personal experience it has been very controversial since 1985.  Until
Prestoserv broke the ice, 15% of the NFS vendors have been hiding in the
closet.  It's just that now we're "coming out."


Vernon Schryver,    vjs@sgi.com