Xref: utzoo comp.unix.ultrix:7400 comp.protocols.nfs:2372 Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!wuarchive!uunet!mcsun!ukc!strath-cs!baird!jim From: jim@cs.strath.ac.uk (Jim Reid) Newsgroups: comp.unix.ultrix,comp.protocols.nfs Subject: Re: nfsd 4, why, and how to tune... Message-ID: Date: 28 May 91 10:09:39 GMT References: <119@janis.UUCP> <21936@cbmvax.commodore.com> Sender: jim@cs.strath.ac.uk Organization: Computer Science Dept., Strathclyde Univ., Glasgow, Scotland. Lines: 37 In-reply-to: grr@cbmvax.commodore.com's message of 26 May 91 23:20:38 GMT In article <21936@cbmvax.commodore.com> grr@cbmvax.commodore.com (George Robbins) writes: I'm really curious whether the Ultrix behavior is a result of bugs or simply the way that all NFS servers act. The worst case seems to be "find" which reads "directories" rather than "files", which I believe are different classes of operation under NFS. It may be that "stateless" behavior that NFS implements turns sequentially "reading" a directory into some highly cpu intensive search and search again algorithm. [ for c.p.nfs types: a client doing a "find" against an Ultrix NFS exported filesystem brings the server to it's knees, with the NFS deamons sharing ~100% of the CPU time amongst themselves... Ouch. This happens often enough to be a recognizable syndrome and prompts a witch hunt to find which client is up to mischief ] Any recursive directory traverse via NFS can be painful (du is just as bad as find). This is because the client makes LOTS of NFS requests - several read directory entries to get the file names and the file handles followed by a get file atributes request for each file. If the client is faster at sending these out than the server is at replying, this is bad news. The server will be bombarded with NFS requests which it can't service quickly enough. The requests timeout, so the client sends them all over again, saturating the server once more and closing the loop. Another nasty is that the client and server file attribute caches will get flushed and filled with entries from the traverse. This can mean that heavily used cache entries have been removed to make way for those at the tail of the directory traverse. Increasing the number of nfsds on the server may help in this situation, but I doubt it. [It's already working the disk as hard as it can so another nsfd process to enqueue requests to the server's disk driver isn't going to help much.] A better solution will be to experiment with increased values for the timeout and restransmission NFS mount parameters ON THE CLIENTS. This will make them behave less agressively when the server is having a hard time. Jim