Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!Firewall!genesis!kdenning
From: kdenning@genesis.Naitc.Com (Karl Denninger)
Newsgroups: comp.protocols.nfs
Subject: Re: NFS performance
Summary: You're missing the point!  Standard Unix I/O is not "safe"
Message-ID: <1991Jun13.234448.16172@Firewall.Nielsen.Com>
Date: 13 Jun 91 23:44:48 GMT
References: <1991Jun13.164017.29944@Firewall.Nielsen.Com> <625@appserv.Eng.Sun.COM>
Sender: news@Firewall.Nielsen.Com (Usenet News)
Organization: AC Nielsen Co., Bannockburn IL
Lines: 70
Nntp-Posting-Host: genesis.naitc.com

In article <625@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes:
>kdenning@genesis.Naitc.Com (Karl Denninger) writes:
>> 
>> I don't quite understand the fanatacism with which people preach the NFS
>> stateless nature, O_SYNC and all that.  The fact is that a crash of a 
>> LOCAL Unix machine with the normal block buffering scheme can easily cause 
>> the loss of data -- in this case, the write(2) call returned "ok" but it 
>> really might not be "OK"!  This is true whether the problem is later found 
>> to be a bad disk sector, the machine panicing, or any one of a number of 
>> other causes.  Normal disk I/O on Unix machines is NOT reliable enough to 
>> say "if you get a good return from write(), the data is safely on disk".
>
>NFS is stateless.  The reason for this statelessness is so that a client
>does not need to do anything special when a server goes down.  A dead
>server looks just like a slow server to a client.

So far, so good.

>A client issues a write, the server ACKs the write.  What does that ACK
>mean?  It means that the client data is safe.  The client kernel may
>throw away the data, the server has promised that the data can be 
>retrieved.
>
>If the server ACKs the data before writing it to disk, there is a window
>during which the server can crash.  The data is then lost.  

How does this differ from the standard "Unix" way of doing file I/O, which
returns a successful reply from a write call before the data is safely on
disk?  If you write data, get back a "ACK" (or good return value) the data
isn't necessarially on disk -- it could be in the buffer cache.  If the 
machine crashes before the data is flushed you lose.

I can't see how this is any different than ACKing packets from NFS clients
when you haven't actually written them any further than the buffer cache
(exactly the same as the standard Unix semantics).  You have the same risks
if the server (the machine with the disk on it :-) crashes as you would with
a local workstation or server drive.  In both cases data can be lost.

>MIPS systems have an unsafe export option that allows you to turn off
>this constraint - big performance win, big safety lose.

There is no export option in the manual pages for RiscOS 4.51 which 
addresses what you're talking about.  I just checked again; it's not there.

>There are other ways to address this problem without breaking the 
>semantics of NFS.  One such way is to buffer the writes in NVRAM.

Like Legato's PrestoServe.  Yes, I know.

That is not completely safe either.  You could have "something happen" to 
the presto board -- and your data would be lost.

The point is that standard Unix machines often say "your data is safe" when
it really isn't.  In fact, ALL systems by the virture of the fact that
hardware can fail make this assumption.  I don't see what you buy by having 
the default for NFS transactions be more "safe" than a local disk drive -- 
other than making recovery from crashes simple for the client side.

I would think that one of the easiest ways to address this would be to allow
an option to have "safe" or "unsafe" writes on a per-mount basis.  This
allows the user to choose his level of performance and risk, and make
his/her own choice.  I'd be for that.

--
Karl Denninger - AC Nielsen, Bannockburn IL (708) 317-3285
kdenning@nis.naitc.com

"The most dangerous command on any computer is the carriage return."
Disclaimer:  The opinions here are solely mine and may or may not reflect
  	     those of the company.