Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!convex.convex.com!thurlow From: thurlow@convex.com (Robert Thurlow) Newsgroups: comp.protocols.nfs Subject: Re: NFS writes and fsync(). Message-ID: Date: 14 Oct 90 14:44:35 GMT References: <1990Oct9.152612@objy.objy.com> <1990Oct14.082712.10811@objy.com> Sender: news@convex.com Lines: 78 In <1990Oct14.082712.10811@objy.com> peter@prefect.uucp (Peter Moore) writes: >> Stop right there. Your 'disk' has just lost data, period. Do you >> expect your local disk to ever do that? >Yes I do expect my local disk to do that. As you sort of mention, >local writes under almost all Unix systems are asynchronous. The >write returns immediately, but the data stays in the buffer pool until >either it is pushed out to make room for more active pages or until >sync() is called. Typically sync() is called every 30 seconds by the >update daemon. So you have no guarantee that your last 30 seconds of >local I/O ever make it to disk unless you explicitly do a sync. And if >something does go wrong (a unrecoverable bad block, drive off line, or >a full crash during the 30 second period), there is no way to signal >back to you that it failed. Heck, your process could have exited >before the write failed. I agree that write(2) won't return you an error in general, but processes can, at any point they wish, call fsync() to ensure the data is secured. That ability is lost if the server is acknowledging only the receipt of the request. close(2) will fail if you can't secure your writes, as well, though people are very poor at paying attention to the return code. If you throw away this ability for a process to get an accurate indication of success, you've definitely made it impossible to trust database I/O over NFS. You've also lost the ability to choose synchronous writes. >> The effects could be very >> devastating, depending on what exactly cared about the data. Think >> of the havoc you could wreak on a database server. >But, as I pointed out, this effect can happen on local writes too. >That is why any `database-like' application must explicitly call >fsync() if it wishes to guarantee that pages have made it to disk. No >recoverable system can depend on the write() alone when writing to >local disk. So synchronous NFS isn't helpful to the database people, at >least for that reason. They are already doing the right thing with >explicit syncs just to make it work locally. This is the problem: after a server says "Yo!", your client need never write that data again, FSYNC() OR NOT, because it "trusts" the server and can in no way tell it should not. The block may be in your buffer cache, but I/O is marked complete; fsync() will lie, likely without even going over the wire. You just can't trust an fsync() anymore, period. >This is why synchronous NFS writes seems to be unmotivated to me. >It is MORE synchronous than local Unix I/O (assuming that network >latency is a lot less than 30 seconds). Why pay such a cost to make >it MORE synchronous than we already are willing to live with on Unix? Networks go down a _lot_ more often that local disk in my experience. People kick out cables, router boxes fail, network adaptors hang, machines get powered down, and of course, the local server disk can fail :-) >Now none of these arguments are overwhelming, but they do add up. I >am not trying to argue that NO one needs or wants synchronous NFS. I >am arguing that not everybody does, (and I believe, but can't defend >better than the above, that MOST people don't need it). I like the idea of having an option. But I'm sufficiently convinced that it should be the default. >> >This is exactly the sort of thing I want. Now I just need it on all >my machines as an option. You're right; it's a real issue getting new features out across the installed base so that everyone can count on them; Brian Pawlowski of Sun underlined it in his talk at the ONC/NFS Industry Networking Conference last week. I think we have to try to cut time-to-market for new functionality and get this stuff out there faster. Rob T -- Rob Thurlow, thurlow@convex.com or thurlow%convex.com@uxc.cso.uiuc.edu ---------------------------------------------------------------------- "This opinion was the only one available; I got here kind of late."