Newsgroups: comp.protocols.nfs Path: utzoo!utgpu!watserv1!watmath!att!linac!Firewall!genesis!kdenning From: kdenning@genesis.Naitc.Com (Karl Denninger) Subject: Re: NFS performance Message-ID: <1991Jun14.222604.13965@Firewall.Nielsen.Com> Summary: More discussion and potential "solutions" (or a good shot at same) Sender: news@Firewall.Nielsen.Com (Usenet News) Nntp-Posting-Host: genesis.naitc.com Organization: AC Nielsen Co., Bannockburn IL References: <1991Jun13.234448.16172@Firewall.Nielsen.Com> <6743@eastapps.East.Sun.COM> Date: Fri, 14 Jun 91 22:26:04 GMT In article <6743@eastapps.East.Sun.COM> geoff@east.sun.com (Geoff Arnold @ Sun BOS - R.H. coast near the top) writes: >Quoth droms@bucknell.edu (in ): >#In article <1991Jun13.234448.16172@Firewall.Nielsen.Com> kdenning@genesis.Naitc.Com (Karl Denninger) writes: ># ># > ># >If the server ACKs the data before writing it to disk, there is a window ># >during which the server can crash. The data is then lost. ># ># How does this differ from the standard "Unix" way of doing file I/O, which ># returns a successful reply from a write call before the data is safely on ># disk? ..... ># >#I think the difference lies in the feedback to the user. If the local >#UNIX box crashes, the user is aware "something is wrong" immediately. >#If the server crashes and reboots, the data can be lost silently... > >It's more than simply a vague "feedback to the user": it's a >question of what assertions can be made about the correctness >of file system operations. Even though normal buffer cache >operations can reorder some kinds of operation, I can code something >like > > write(file1, data1) > fsync(file1) > write(file2, "file1 was written successfully") > >(with appropriate error checking) and be confident that file2 will >be written if and only if file1 was written. Karl's "standard Unix way" >doesn't apply here: if the machine crashes, the process will crash >with it. If an NFS server could ack the first write (but not >commit it to stable storage), then crash and reboot, the failure >of the write would be undetectable. Understood. However, the issue is data loss, not reboot-n-continue behavior or whether the process dies along with the machine. If you soft mount directories (yes, I know this is dangerous) your process will get an I/O failure if the server goes down -- indicating that you have lost >something<. Data loss is data loss -- with or without the process continuing to exist. I would think that the real solution here would be to have a crashed and rebooted server return some form of error on the next I/O request (what, I don't know offhand, perhaps ENXIO) if you are mounted async and the server crashes and reboots. At least you'd be notified that there is a potential data integrity problem that your software needs to investigate or report. >The decision as to whether data should be written "safely" or not should >logically rest with the client, not the server. This is why the >hack of an async server side configuration option is so dangerous. >The correct approach, of course, is the (unimplemented) RFS_WRITECACHE >NFS function.... >sigh< But for now, Prestoserve is the best solution. >--Geoff Arnold, PC-NFS architect(geoff@East.Sun.COM or geoff.arnold@Sun.COM)-- AGREED. The decision SHOULD be with the client. I believe that many systems would opt for the async choice, but I disagree with making it something you don't have control over at the client level. One other option would be to have fsync() on an NFS file return success only if all operations since the last fsync() or open() had succeeded. A crash is an exception condition here, since the client will not have executed an open() prior to the fsync() -- thus, in that case fsync() would return failure. If the client opens with O_SYNC, then you do only sync I/O. On a close() do an implied fsync(), and again return success only if all data "makes it". This does require keeping one bit of state around -- whether or not an "open" or "fsync" has been executed (a noted I/O error rates a "no" to that question). This is very close to the semantics of a local filesystem, and should be pretty easy to do. It also doesn't affect anything on existing software (except that reliability for programs that don't do a fsync() or check close() return values are at risk, but on a local disk in this case they would be too!) This is what one would expect on a local disk in the event of a disk failure -- if you didn't check close()'s return value you might mistakenly think your data all got there when it didn't. Prestoserve is not a total safety net -- it's hardware, and CAN fail. The risks there are exactly the same as a crash/disk failure/whatever. The only real saving grace there is that it doesn't fail often, having no moving parts. -- Karl Denninger - AC Nielsen, Bannockburn IL (708) 317-3285 kdenning@nis.naitc.com "The most dangerous command on any computer is the carriage return." Disclaimer: The opinions here are solely mine and may or may not reflect those of the company.