Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!icdoc!qmw-cs!liam From: liam@cs.qmw.ac.uk (William Roberts;) Newsgroups: comp.protocols.nfs Subject: Re: Buffering in biod and nfsd. Message-ID: <2805@redstar.cs.qmw.ac.uk> Date: 20 Dec 90 20:25:44 GMT References: <1990Dec15.071319.16674@objy.com> Sender: usenet@cs.qmw.ac.uk Lines: 135 Nntp-Posting-Host: whitesand In <1990Dec15.071319.16674@objy.com> peter@prefect.Berkeley.EDU (Peter Moore) writes: >biod: > I have seen biod described as read-ahead and write-behind. > Which implies it both caches writes (in the sense that it > returns before the write is actually done) and it actually > reads more blocks than requested, in anticipation of the > additional blocks being used in a future calls. So: No - biods are about extra, kernel initiated, I/O for which isn't directly associated with an ordinary process. Specifically, if the kernel decides that you are reading a file sequentially and that it should attempt to read the next block of the file in advance of your process actually asking for it, then it would normally just add that block address to the list of things for the disk device driver to do. For NFS, the biod is used instead so that the kernel can keep track of the request it made to the server and what to do when the answer comes back. The existing kernel mechanisms need process slots entries to wait for I/O, so the biods provide such slots. Similarly, the biods can be used to handle cache flushes to remote files. > > a) Does it return before the actual NFS-write is complete? No - the server should only reply when the write has actually reached the disk service. > b) Again, if a) is true, is there any way for the user to find > that the write failed? If the biod did the write, then the user process finds out at the next operation on that file: remember to check the value returned by close()! > c) If so, is there any way a user process can assure that a > particular block or all of its writes in have been > written yet? In particular does fsync work or is it (as I > have heard) a no-op? The fsync() call works as it should do, namely it flushes all locally cached writes to the disk surface (local OR remote) and returns only when every block has been completely written out. > d) Does biod actually read-ahead? Yes. The kernel decides that a read-ahead is required, then a free biod is chosen to ask the server for that block. > e) If so, how does it decide when to flush the cached data and > actually re-read the data? Flushing cached data is about write-behind, not read-ahead. The only way to get at data in a file on the NFS server is via a file handle (i.e. it isn't block level access). All NFS servers provide a "last modified date" on their files, so the clients can do a loose form of cache checking by recording (in the vnode) the last modify time of the file. Every operation on the file returns the new modify date. If our local record of the modify date is older than 3 seconds (typically, see the actimeo option in later SunOS systems) then we stat the remote file to see if anyone else has modified it. If the modify time is unchanged then our cached information is valid. If someone else has modified the file, we retaliate by flushing *our* changes.... basically this is going to be even worse than two processes writing to the same file in a local filesystem. > f) Is there any way a user process can affect that cacheing? SunOS 4.x provides a mount option called "actimeo" which allows you to set the time. You can force a cache flush using the standard "sync" command or the sync() system call. >nfsd: > a) Does the nfsd the write back directly do disk, or maintain > a personal cache? (My understanding is that modulo > WRITECACHE, it definitely does not, in fact it even flushes > the OS cache). nfsds write to the file SYNCHRONOUSLY otherwise a server crash could lose information. The client is assumed to forget what it told the server once the NFS write has completed, so if the server didn't store that data on its disk then a server crash loses the whole lot. Writecache is an abomination. > b) If (heaven forbid and presto-serve not installed) it does cache > writes, can this be flushed under user control? Only by doing sync() system calls on the server. > c) Does it do any read-ahead/read-cacheing (I would certainly > hope it wouldn't) nfsds probably don't do the readahead, because they are stateless and don;t remember what they were asked for before. They do operate through the server disk cache as normal (thay have to, to get easy access to the code which understands the file system structure) so the data they read is placed into the cache while they sleep waiting for it. They also benefit from cache hits etc. > d) If (again, heaven forbid) it does do read-cacheing, can that > be flushed under user control? There is no way, to my knowledge, that *anyone* can invalidate the whole cache short of unmounting the disk. The nfsds live in the *server* remember, so they stand to gain from the standard caching of reads & writes to their local filesystem. >... But I hope this is not true, since it make NFS mounted file systems >pure poison for any one doing distributed database work. Yep, 100% not-appropriate poison. NFS is very good for shared read-only things such as binaries, libraries etc, and fairly good for read/write from a single client. It is therefore very useful for personal filestores, since there is only one little me and I'mnot often modifying files from two different clients at the same time. But don't try to use it for distributed databases: that isn't what is was designed for and it won't work. >Most, if not all, of these problems can be eliminated by directly >connecting to the nfsd, and do the RPC calls directly, but that is >fairly drastic. Wrong again. If you are prepared to do RPC, then write your own database server that does what you actually want to do: NFS contains no user-servicable parts! > Anyway, thanks for whatever help you can give me, You're welcome. You should look up the original Sandberg paper on NFS, which is %A R. Sandberg %T The Sun Network File System: Design, Implementation and Experience %D Summer 1985 %J Sun Technical Report %I Sun Microsystems Inc. It was probably presented at USENIX at around that sort of time, but I don't have a better reference for it. PS. Does comp.protocols.nfs have a "Frequently asked questions"? I'd imagine that "What does a biod do?" and "What does an nfsd do?" would be near the top of the list :-) -- William Roberts ARPA: liam@cs.qmw.ac.uk Queen Mary & Westfield College UUCP: liam@qmw-cs.UUCP Mile End Road AppleLink: UK0087 LONDON, E1 4NS, UK Tel: 071-975 5250 (Fax: 081-980 6533)