Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!icdoc!qmw-cs!liam
From: liam@cs.qmw.ac.uk (William Roberts;)
Newsgroups: comp.protocols.nfs
Subject: Re: Buffering in biod and nfsd.
Message-ID: <2805@redstar.cs.qmw.ac.uk>
Date: 20 Dec 90 20:25:44 GMT
References: <1990Dec15.071319.16674@objy.com>
Sender: usenet@cs.qmw.ac.uk
Lines: 135
Nntp-Posting-Host: whitesand

In <1990Dec15.071319.16674@objy.com> peter@prefect.Berkeley.EDU (Peter Moore) 
writes:


>biod:
>	I have seen biod described as read-ahead and write-behind.
>	Which implies it both caches writes (in the sense that it
>	returns before the write is actually done) and it actually
>	reads more blocks than requested, in anticipation of the
>	additional blocks being used in a future calls.  So:


No - biods are about extra, kernel initiated, I/O for which isn't directly 
associated with an ordinary process. Specifically, if the kernel decides that 
you are reading a file sequentially and that it should attempt to read the 
next block of the file in advance of your process actually asking for it, then 
it would normally just add that block address to the list of things for the 
disk device driver to do. For NFS, the biod is used instead so that the kernel 
can keep track of the request it made to the server and what to do when the 
answer comes back. The existing kernel mechanisms need process slots entries 
to wait for I/O, so the biods provide such slots.

Similarly, the biods can be used to handle cache flushes to remote files.
>	
>	a) Does it return before the actual NFS-write is complete?
No - the server should only reply when the write has actually reached the disk 
service.

>	b) Again, if a) is true, is there any way for the user to find
>	   that the write failed?
If the biod did the write, then the user process finds out at the next 
operation on that file: remember to check the value returned by close()!

>	c) If so, is there any way a user process can assure that a
>	    particular block or all of its writes in have been
>	    written yet?  In particular does fsync work or is it (as I
>	    have heard) a no-op?
The fsync() call works as it should do, namely it flushes all locally cached 
writes to the disk surface (local OR remote) and returns only when every block 
has been completely written out.

>	d) Does biod actually read-ahead?
Yes. The kernel decides that a read-ahead is required, then a free biod is 
chosen to ask the server for that block.

>	e) If so, how does it decide when to flush the cached data and
>	   actually re-read the data?
Flushing cached data is about write-behind, not read-ahead. The only way to 
get at data in a file on the NFS server is via a file handle (i.e. it isn't 
block level access). All NFS servers provide a "last modified date" on their 
files, so the clients can do a loose form of cache checking by recording (in 
the vnode) the last modify time of the file. Every operation on the file 
returns the new modify date. If our local record of the modify date is older 
than 3 seconds (typically, see the actimeo option in later SunOS systems) then 
we stat the remote file to see if anyone else has modified it. If the modify 
time is unchanged then our cached information is valid. If someone else has 
modified the file, we retaliate by flushing *our* changes.... basically this 
is going to be even worse than two processes writing to the same file in a 
local filesystem.

>	f) Is there any way a user process can affect that cacheing?
SunOS 4.x provides a mount option called "actimeo" which allows you to set the 
time. You can force a cache flush using the standard "sync" command or the 
sync() system call.


>nfsd:
>	a) Does the nfsd the write back directly do disk, or maintain
>	   a personal cache?  (My understanding is that modulo
>	   WRITECACHE, it definitely does not, in fact it even flushes
>	   the OS cache).
nfsds write to the file SYNCHRONOUSLY otherwise a server crash could lose 
information. The client is assumed to forget what it told the server once the 
NFS write has completed, so if the server didn't store that data on its disk 
then a server crash loses the whole lot. Writecache is an abomination.

>	b) If (heaven forbid and presto-serve not installed) it does cache
>	   writes, can this be flushed under user control?

Only by doing sync() system calls on the server.

>	c) Does it do any read-ahead/read-cacheing (I would certainly
>	   hope it wouldn't)
nfsds probably don't do the readahead, because they are stateless and don;t 
remember what they were asked for before. They do operate through the server 
disk cache as normal (thay have to, to get easy access to the code which 
understands the file system structure) so the data they read is placed into 
the cache while they sleep waiting for it. They also benefit from cache hits 
etc.

>	d) If (again, heaven forbid) it does do read-cacheing, can that
>	   be flushed under user control?
There is no way, to my knowledge, that *anyone* can invalidate the whole cache 
short of unmounting the disk. The nfsds live in the *server* remember, so they 
stand to gain from the standard caching of reads & writes to their local 
filesystem.

>... But I hope this is not true, since it make NFS mounted file systems
>pure poison for any one doing distributed database work.
Yep, 100% not-appropriate poison. NFS is very good for shared read-only things 
such as binaries, libraries etc, and fairly good for read/write from a single 
client. It is therefore very useful for personal filestores, since there is 
only one little me and I'mnot often modifying files from two different clients 
at the same time. But don't try to use it for distributed databases: that 
isn't what is was designed for and it won't work.

>Most, if not all, of these problems can be eliminated by directly
>connecting to the nfsd, and do the RPC calls directly, but that is
>fairly drastic.
Wrong again. If you are prepared to do RPC, then write your own database 
server that does what you actually want to do: NFS contains no user-servicable 
parts!

>      Anyway, thanks for whatever help you can give me,
You're welcome. You should look up the original Sandberg paper on NFS, which is

   %A R. Sandberg
   %T The Sun Network File System: Design, Implementation and Experience
   %D Summer 1985
   %J Sun Technical Report
   %I Sun Microsystems Inc.

It was probably presented at USENIX at around that sort of time, but I don't 
have a better reference for it.

PS. Does comp.protocols.nfs have a "Frequently asked questions"? I'd imagine 
that "What does a biod do?" and "What does an nfsd do?" would be near the top 
of the list :-)
--

William Roberts                 ARPA: liam@cs.qmw.ac.uk
Queen Mary & Westfield College  UUCP: liam@qmw-cs.UUCP
Mile End Road                   AppleLink: UK0087
LONDON, E1 4NS, UK              Tel:  071-975 5250 (Fax: 081-980 6533)