Path: utzoo!attcan!uunet!wuarchive!brutus.cs.uiuc.edu!apple!chuq
From: chuq@Apple.COM (Chuq Von Rospach)
Newsgroups: comp.protocols.nfs
Subject: Re: mountd Performance under Stress
Keywords: mountd nfs performance
Message-ID: <34283@apple.Apple.COM>
Date: 24 Aug 89 19:46:01 GMT
References: <1577@dsacg3.UUCP>
Organization: Life is just a Fantasy novel played for keeps
Lines: 56


>The mount server appears to be becoming a bottleneck for an application in
>which we've a large number of PC clients accessing data on a minicomputer
>server. On occasion we can have quite a few users issuing multiple mount
>requests simultaneously. When this happens we see some of the requests time
>out, while users accessing already mounted files continue to receive good
>service.

Definitely. For a good time, set up a machine exporting USENET to three or
four hundred machines and then have it crash for 24 hours. All of the NFS
servers jump on it as soon as it comes back up, and I've seen mount requests
sit two hours waiting to happen. 

>The mount server has to read /etc/exports, and to do the host name to IP
>address translation would also have to access /etc/hosts (or the name
>server), and it writes /etc/rmtab. So we thought mountd might be having
>trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting
>on disk.

The disk activity of mountd is fairly trivial.hostname looks via Yellow
Pages clears out a good bit since you aren't sequentially searching the host
table.

Imagine, though, what's happening at the network layer. 50-100 (or more)
machines are all trying to create connections to the mountd at once. It's
spinning away, dealing with them as fast as it can, but the ethernet buffers
are all clogged with incoming packets, the mbuf pool is wedged full of
pending requests that are already in the queue (making it tough, sometimes,
for the mountd to get the memory it needs to return an fhandle to the client
so it can finish a given request, packets are being dropped on the floor,
clients are timing out and sending repeat requests -- it gets *really* nasty.

You end up, essentially thrashing at a couple of layers in the kernel and
sending lots and lots of ethernet packets all over everywhere. It isn't,
really, a CPU bottleneck although a faster CPU will help somewhat. 

The problem from what I've seen, is that the statelessness of NFS makes it
impossible for the client to tell whether the server has never seen its
request (as opposed to knowing about it and not acting on it yet). So it has
to assume the request disappeared and send it out again when it times out.
This is correct most of the time, but not in this kind of worst-case
scenario. One way to minimize it under the current scheme would be to make
the "mount request timeout" be a sliding scale similar to ethernet packet
collision delays -- every time it times out, the client waits a little
longer (with a randomizing factor tossed in) before sending the request
again. That isn't reducing the mounting load, but simply spreading it out
further in time. Doesn't hurt the normal case, and would reduce some of the
clogging in the worst case scenario.

chuq

Chuq Von Rospach      =|=     Editor,OtherRealms     =|=     Member SFWA/ASFA
         chuq@apple.com   =|=  CI$: 73317,635  =|=  AppleLink: CHUQ
      [This is myself speaking. No company can control my thoughts.]