Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!helios!auvc8!auvsaff From: auvsaff@auvc8.tamu.edu (David Safford) Newsgroups: comp.unix.wizards Subject: Re: Another reason I hate NFS: Silent data loss! Message-ID: <17435@helios.TAMU.EDU> Date: 17 Jun 91 13:56:15 GMT References: <4339.Jun1501.31.5191@kramden.acf.nyu.edu> <17105@darkstar.ucsc.edu> <1991Jun17.084533.15905@prl.dec.com> Sender: usenet@helios.TAMU.EDU Reply-To: auvsaff@auvc8.tamu.edu (David Safford) Lines: 48 In article <1991Jun17.084533.15905@prl.dec.com>, boyd@prl.dec.com (Boyd Roberts) writes: |>All this nonsense about statelessness is just a smoke screen. |>As soon as anyone proposes a change the immediate response is |>`but then it's not _stateless_'. We'll as far as I'm concerned: |> |> s/stateless/bug-full/ |> |>I could never understand this nonsense. What makes them so sure that |>when a crashed server comes up your data will still be intact? It is nearly impossible for any system, stateless or statefull, to guarantee data integrity. There are many approaches, and all have certain advantages and disadvantages. Stateless designs tend to be simpler, faster, and less reliable. For my research lab, Suns with NFS have proven to be fast, convenient, and sufficiently reliable. In fact, in four years we have not lost a single byte of data due to NFS. Yes, there have been bugs, such as actimeo, and the nfs-confused-client problem, but they were rather rapidly patched. Conversely, because NFS is stateless, we have NEVER been bothered by workstation crashes, which, with our demanding distributed research applications, have occured all too frequently :-). |>That UDP `protocol' really sucks the mop. Soft/hard mounts. What a joke. |>What's needed is a connection based stream protocol. Then you know the |>difference between remote slow and remote dead. It's all a question |>of flow control. NFS has none. Not even sequence numbers. Hmm. Simply switching to TCP or other stream protocol will not differentiate between "remote slow and remote dead". If anything, TCP tends to hide remote failures. If your connection is explicitly dropped by the remote host, you will know immediately, but other link or kernel failures can be hidden for a long time by TCP retries and adaptive algorithms. |>We run a lot of NFS here, and it's as flakey as C shell. Two of the machines |>here just go to sleep every once in a while, when the traffic gets a little |>strong. God knows why. It's going to take a lot of pondering to track it |>down. Even then, it's probably a fundamental design problem that can't, |>or won't, be fixed. We run a lot of NFS here, too, and are very happy with it. We certainly don't go blaming all of our application failures on it without some evidence. If your needs dictate tighter, statefull, service, then by all means feel free to use AFS, but realize that many other people are happy with NFS. dave safford Texas A&M University auvsaff@auvsun1.tamu.edu