Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!helios!auvc8!auvsaff
From: auvsaff@auvc8.tamu.edu (David Safford)
Newsgroups: comp.unix.wizards
Subject: Re: Another reason I hate NFS: Silent data loss!
Message-ID: <17435@helios.TAMU.EDU>
Date: 17 Jun 91 13:56:15 GMT
References: <4339.Jun1501.31.5191@kramden.acf.nyu.edu> <17105@darkstar.ucsc.edu> <1991Jun17.084533.15905@prl.dec.com>
Sender: usenet@helios.TAMU.EDU
Reply-To: auvsaff@auvc8.tamu.edu (David Safford)
Lines: 48

In article <1991Jun17.084533.15905@prl.dec.com>, boyd@prl.dec.com (Boyd
Roberts) writes:
|>All this nonsense about statelessness is just a smoke screen.
|>As soon as anyone proposes a change the immediate response is
|>`but then it's not _stateless_'.  We'll as far as I'm concerned:
|>
|>    s/stateless/bug-full/
|>
|>I could never understand this nonsense.  What makes them so sure that
|>when a crashed server comes up your data will still be intact?  

It is nearly impossible for any system, stateless or statefull, to 
guarantee data integrity.  There are many approaches, and all have
certain advantages and disadvantages.  Stateless designs tend to be
simpler, faster, and less reliable.  For my research lab, Suns with
NFS have proven to be fast, convenient, and sufficiently reliable.
In fact, in four years we have not lost a single byte of data due to NFS.
Yes, there have been bugs, such as actimeo, and the nfs-confused-client
problem, but they were rather rapidly patched.  Conversely, because
NFS is stateless, we have NEVER been bothered by workstation crashes,
which, with our demanding distributed research applications, have
occured all too frequently :-).  

|>That UDP `protocol' really sucks the mop.  Soft/hard mounts.  What a joke.
|>What's needed is a connection based stream protocol.  Then you know the
|>difference between remote slow and remote dead.  It's all a question
|>of flow control.  NFS has none.  Not even sequence numbers.  

Hmm. Simply switching to TCP or other stream protocol will not differentiate
between "remote slow and remote dead".  If anything, TCP tends to hide
remote failures.  If your connection is explicitly dropped by the remote
host, you will know immediately, but other link or kernel failures can
be hidden for a long time by TCP retries and adaptive algorithms.

|>We run a lot of NFS here, and it's as flakey as C shell.  Two of the machines
|>here just go to sleep every once in a while, when the traffic gets a little
|>strong.  God knows why.  It's going to take a lot of pondering to track it
|>down.  Even then, it's probably a fundamental design problem that can't,
|>or won't, be fixed.

We run a lot of NFS here, too, and are very happy with it.  We certainly don't
go blaming all of our application failures on it without some evidence.
If your needs dictate tighter, statefull, service, then by all means feel
free to use AFS, but realize that many other people are happy with NFS.

dave safford
Texas A&M University
auvsaff@auvsun1.tamu.edu