Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!tut.cis.ohio-state.edu!brutus.cs.uiuc.edu!wuarchive!wugate!uunet!mcsun!ukc!icdoc!qmc-cs!liam From: liam@cs.qmc.ac.uk (William Roberts) Newsgroups: comp.protocols.nfs Subject: Re: mountd Performance under Stress Summary: Race condition + rmtab considered harmful Keywords: mountd nfs performance Message-ID: <1199@sequent.cs.qmc.ac.uk> Date: 31 Aug 89 19:16:59 GMT References: <1577@dsacg3.UUCP> <34283@apple.Apple.COM> Reply-To: liam@cs.qmc.ac.uk (William Roberts) Organization: Computer Science Dept, Queen Mary College, University of London, UK. Lines: 73 Expires: Sender: Followup-To: Distribution: >>On occasion we can have quite a few users issuing multiple mount >>requests simultaneously. When this happens we see some of the requests time >>out, while users accessing already mounted files continue to receive good >>service. This is a difference between user-level RCP and kernel-level RPC. The kernel level *knows* that its NFS RPC requests are idempotent and so it doesn't change the xid when it does sends a retransmission. This means that the first reply is acceptable no matter how many retransmissions have occurred. The user-level makes no such guarantee, so there is a new xid for each retransmission. In particular, this means that the mount program's RPC request to the mount daemons *have* to be answered before the timeout period is up otherwise that reply is discarded as out of date. Ultimately this becomes a race condition, especially as the mount requests are small and the machine can buffer lots of them. We had an NFS server with 40 clients that was a 0.5 MIP Whitechapel MG1 - when all 40 clients rebooted after a power failure it was taking about 3 minutes from a client sending a request to the mountd sending the reply, by which time there were a lot of 25 second timeouts gone by. Funny thing is, every mountd response is identical, so the first one would do and the rest can be discarded.... You are just lucky that your server occasionally gets in there quick enough! >>The mount server has to read /etc/exports, and to do the host name to IP >>address translation would also have to access /etc/hosts (or the name >>server), and >> ***it writes /etc/rmtab*** [ my emphasis ] >>. So we thought mountd might be having >>trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting >>on disk. To be more specific, it does a linear scan through rmtab looking to see if this mount request is already there and adds onto the end if it isn't. On my main machine /etc/rmtab is 978 lines long. The reason it is so long is that most clients unmount their disks by crashing, so the rmtab file never gets cleared by unmount requests. On our MG1 servers we reniced the mountd to -15 and removed all the /etc/rmtab nonsense. I'm sorry Chuq, but all that stuff about relentless mashing of mbufs just doesn't sound at all plausible, especially since the lucky clients who have already mounted are getting good service. (If it hadn't been from someone who ought to know I would have loudly decried it as complete *@*!%*, but perhaps I'm not so certain of my ground...) The Bottom Line: 1) Change mount to use a TCP connection to the mountd, or otherwise provide an idempotent RPC 2) Change mountd to use a dbm file or some other means or speeding up the search through rmtab. 3) Encourage people to remove rmtab as part of the boot sequence! Actually, idempotent RPC is an easy and valuable thing to do, especially as you just say "Buyer beware" and treat "idempotent RPC" to mean "don'T increment the xid for each retransmission". -- William Roberts ARPA: liam@cs.qmc.ac.uk Queen Mary College UUCP: liam@qmc-cs.UUCP AppleLink: UK0087 190 Mile End Road Tel: 01-975 5250 LONDON, E1 4NS, UK Fax: 01-980 6533