Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!convex.convex.com!thurlow
From: thurlow@convex.com (Robert Thurlow)
Newsgroups: comp.protocols.nfs
Subject: Re: NFS writes and fsync().
Message-ID: <thurlow.655748135@convex.convex.com>
Date: 12 Oct 90 16:15:35 GMT
References: <1990Oct9.152612@objy.objy.com>
Sender: usenet@convex.com
Lines: 52

In <1990Oct9.152612@objy.objy.com> peter@objy.objy.com (Peter Moore) writes:
>WHY ON EARTH DOES NFS REQUIRE THE FSYNC ON WRITES?  Without that
>requirement, we could the effect of this cache board by just not
>calling fsync().

No, you couldn't.  The cache board for PCs that I know about is a nice
unit that essentially promises you the data won't go away and keeps it
in battery-backed memory to ensure it.  That's important, since once the
write request is acknowledged, the client will not try the write again,
and may discard its copy of the data.  You can easily lose data when the
server goes down without the server syncing it.  Usually, too, what
waits for the acknowledgement is a block I/O daemon (biod) that will
handle your async writes for you; your process has to wait for all I/O
only when it does an fsync() or a close(), though aggregate throughput
is reduced.

I think most people would agree that the default behaviour should be to
make writes reliable, since that provides the semantics of a local
filesystem.  You are more free to buy extra throughput by upgrading the
server disk or CPU that you are to buy more reliability.  That said,
I'll add that we do provide an export option to allow you to tell the
server to acknowledge the write request immediately upon receipt, and
spool the request to its local I/O subsystem.  It can help performance
a good bit if you don't mind the risks.  It's great for filesystems all
clients mount with -soft; their processes will be gone after a server
reboot, anyway.

>Now whenever I see something ugly in NFS, it usually comes from the
>stateless requirement.  But the only state dependent reason I can see is:
>    Process P on machine A writes to machine B
>    machine B crashes before the write is synced to disk

Stop right there.  Your 'disk' has just lost data, period.  Do you
expect your local disk to ever do that?  The effects could be very
devastating, depending on what exactly cared about the data.  Think
of the havoc you could wreak on a database server.

>But in real life, I have seen situations vaguely like this, and the
>writing process gets a `stale NFS handle' error.  So it seems that at
>least the NFS implementations I have run into have that much state.

ESTALE only happens when the server can't find anything matching the
file handle on its disks, and usually happens when some other process
did a creat() or an unlink(), or the server filesystems got mounted in
a different order.  I don't see the connection here.

Hope that helped,
Rob T
--
Rob Thurlow, thurlow@convex.com or thurlow%convex.com@uxc.cso.uiuc.edu
----------------------------------------------------------------------
"I was so much older then, I'm younger than that now."