Path: utzoo!attcan!uunet!wuarchive!brutus.cs.uiuc.edu!apple!chuq From: chuq@Apple.COM (Chuq Von Rospach) Newsgroups: comp.protocols.nfs Subject: Re: mountd Performance under Stress Keywords: mountd nfs performance Message-ID: <34283@apple.Apple.COM> Date: 24 Aug 89 19:46:01 GMT References: <1577@dsacg3.UUCP> Organization: Life is just a Fantasy novel played for keeps Lines: 56 >The mount server appears to be becoming a bottleneck for an application in >which we've a large number of PC clients accessing data on a minicomputer >server. On occasion we can have quite a few users issuing multiple mount >requests simultaneously. When this happens we see some of the requests time >out, while users accessing already mounted files continue to receive good >service. Definitely. For a good time, set up a machine exporting USENET to three or four hundred machines and then have it crash for 24 hours. All of the NFS servers jump on it as soon as it comes back up, and I've seen mount requests sit two hours waiting to happen. >The mount server has to read /etc/exports, and to do the host name to IP >address translation would also have to access /etc/hosts (or the name >server), and it writes /etc/rmtab. So we thought mountd might be having >trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting >on disk. The disk activity of mountd is fairly trivial.hostname looks via Yellow Pages clears out a good bit since you aren't sequentially searching the host table. Imagine, though, what's happening at the network layer. 50-100 (or more) machines are all trying to create connections to the mountd at once. It's spinning away, dealing with them as fast as it can, but the ethernet buffers are all clogged with incoming packets, the mbuf pool is wedged full of pending requests that are already in the queue (making it tough, sometimes, for the mountd to get the memory it needs to return an fhandle to the client so it can finish a given request, packets are being dropped on the floor, clients are timing out and sending repeat requests -- it gets *really* nasty. You end up, essentially thrashing at a couple of layers in the kernel and sending lots and lots of ethernet packets all over everywhere. It isn't, really, a CPU bottleneck although a faster CPU will help somewhat. The problem from what I've seen, is that the statelessness of NFS makes it impossible for the client to tell whether the server has never seen its request (as opposed to knowing about it and not acting on it yet). So it has to assume the request disappeared and send it out again when it times out. This is correct most of the time, but not in this kind of worst-case scenario. One way to minimize it under the current scheme would be to make the "mount request timeout" be a sliding scale similar to ethernet packet collision delays -- every time it times out, the client waits a little longer (with a randomizing factor tossed in) before sending the request again. That isn't reducing the mounting load, but simply spreading it out further in time. Doesn't hurt the normal case, and would reduce some of the clogging in the worst case scenario. chuq Chuq Von Rospach =|= Editor,OtherRealms =|= Member SFWA/ASFA chuq@apple.com =|= CI$: 73317,635 =|= AppleLink: CHUQ [This is myself speaking. No company can control my thoughts.]