Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!Firewall!genesis!kdenning From: kdenning@genesis.Naitc.Com (Karl Denninger) Newsgroups: comp.protocols.nfs Subject: Re: NFS performance Summary: You're missing the point! Standard Unix I/O is not "safe" Message-ID: <1991Jun13.234448.16172@Firewall.Nielsen.Com> Date: 13 Jun 91 23:44:48 GMT References: <1991Jun13.164017.29944@Firewall.Nielsen.Com> <625@appserv.Eng.Sun.COM> Sender: news@Firewall.Nielsen.Com (Usenet News) Organization: AC Nielsen Co., Bannockburn IL Lines: 70 Nntp-Posting-Host: genesis.naitc.com In article <625@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: >kdenning@genesis.Naitc.Com (Karl Denninger) writes: >> >> I don't quite understand the fanatacism with which people preach the NFS >> stateless nature, O_SYNC and all that. The fact is that a crash of a >> LOCAL Unix machine with the normal block buffering scheme can easily cause >> the loss of data -- in this case, the write(2) call returned "ok" but it >> really might not be "OK"! This is true whether the problem is later found >> to be a bad disk sector, the machine panicing, or any one of a number of >> other causes. Normal disk I/O on Unix machines is NOT reliable enough to >> say "if you get a good return from write(), the data is safely on disk". > >NFS is stateless. The reason for this statelessness is so that a client >does not need to do anything special when a server goes down. A dead >server looks just like a slow server to a client. So far, so good. >A client issues a write, the server ACKs the write. What does that ACK >mean? It means that the client data is safe. The client kernel may >throw away the data, the server has promised that the data can be >retrieved. > >If the server ACKs the data before writing it to disk, there is a window >during which the server can crash. The data is then lost. How does this differ from the standard "Unix" way of doing file I/O, which returns a successful reply from a write call before the data is safely on disk? If you write data, get back a "ACK" (or good return value) the data isn't necessarially on disk -- it could be in the buffer cache. If the machine crashes before the data is flushed you lose. I can't see how this is any different than ACKing packets from NFS clients when you haven't actually written them any further than the buffer cache (exactly the same as the standard Unix semantics). You have the same risks if the server (the machine with the disk on it :-) crashes as you would with a local workstation or server drive. In both cases data can be lost. >MIPS systems have an unsafe export option that allows you to turn off >this constraint - big performance win, big safety lose. There is no export option in the manual pages for RiscOS 4.51 which addresses what you're talking about. I just checked again; it's not there. >There are other ways to address this problem without breaking the >semantics of NFS. One such way is to buffer the writes in NVRAM. Like Legato's PrestoServe. Yes, I know. That is not completely safe either. You could have "something happen" to the presto board -- and your data would be lost. The point is that standard Unix machines often say "your data is safe" when it really isn't. In fact, ALL systems by the virture of the fact that hardware can fail make this assumption. I don't see what you buy by having the default for NFS transactions be more "safe" than a local disk drive -- other than making recovery from crashes simple for the client side. I would think that one of the easiest ways to address this would be to allow an option to have "safe" or "unsafe" writes on a per-mount basis. This allows the user to choose his level of performance and risk, and make his/her own choice. I'd be for that. -- Karl Denninger - AC Nielsen, Bannockburn IL (708) 317-3285 kdenning@nis.naitc.com "The most dangerous command on any computer is the carriage return." Disclaimer: The opinions here are solely mine and may or may not reflect those of the company.