Xref: utzoo comp.protocols.nfs:1870 comp.arch:21187 Path: utzoo!utgpu!watserv1!watmath!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!samsung!olivea!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) Newsgroups: comp.protocols.nfs,comp.arch Subject: Re: how many nfsd's should I run? Message-ID: Date: 4 Mar 91 21:02:14 GMT References: <28975@cs.yale.edu> <476@appserv.Eng.Sun.COM> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 112 Nntp-Posting-Host: aberdb In-reply-to: lm@slovax.Berkeley.EDU's message of 1 Mar 91 21:37:30 GMT [ this article may have already appeared; I repost it because probably it did not get out of the local machine; apologies if you see it more than once ] [ ... on SUN NFS/MMU sys time bogosity ... ] pcg> I have seen the same server running the same load under SunOS 3 the pcg> day before with 10-20% system time and 100-200 context switches per pcg> second, and with SunOS 4 the day after with 80-90% system time and pcg> 800-900 context switches per second. An MMU slot swap on a Sun 3 pcg> will take about a millisecond, which fits. On 1 Mar 91 21:37:30 GMT, Larry McVoy commented: lm> You may well have seen this. Jumping to the conclusing that it is lm> caused by NFS is false, at least the reasons that you list are not lm> true. This you say after recognizing above that the problem existed and claiming that in recent SunOS releases it has been obviated. Now you seem to hint that it is NFS related, buit not because of MMU context switching. As to me, my educated guesses are: this bogosity appears to be strictly correlated with the number of NFS transactions processed per second, and the overhead per transaction seems to be about 1ms and that 1ms seems to be the cost of a MMU swap, and the number of context switches per second reported by vmstat(8) seems to be correlated strongly to the number of active nfsd processed, and the system time accumulated by nfsd processes becomes very large when there are many context switches per second, but not otherwise. Anybody with this problem (it helps to have servers running both SunOS 3.5 and SunOS 4.0.x) can have a look at the evidence, thanks to the wonders of nfsstat(8), vmstat(1), ps(1) and pstat(8). In particular 'vmstat 1' (the 'r' 'b' 'cs' 'sy' columns), 'ps axv' (the 'TIM" and 'PAGEIN' columns) will be revealing; 'nfstat -ns' and 'pstat -u ' will give extra details (sample outputs for both SunOS 3 and 4 available on request). The inferences that can be drawn are obvious, even if maybe wrong. After all I don't spend too much time second guessing the *whys* of Sun bogosities, contrary to appearances. I am already overwhelmed by those in AT&T Sv386 at home... :-). Pray, tell us why the above observed behaviour is not a bogosity, or at least what was/is the cause, and how/if it has been obviated three years ago. My explanation is a best guess, as should be pretty obvious; you need not guess, and I am sure that enquiring minds want to know. As to the details: lm> nfs_svc() lm> /* Now, release client memory; we never return back to user */ lm> relvm(u.u_procp); lm> From the SCCS history (note the date): The date is when the file was edited on a machine at Sun R&D... This is slightly cheating. When did the majority of customers see this? lm> D 2.83 87/12/15 18:34:42 kepecs 88 87 lm> remove virtual memory in async_daemon and nfs_svc as lm> it's not needed. Remove pre-2.0 code to set rdir in nfs_svc. lm> make sure these guys exit if error, since no vm to return to. Note that this was well known to me, and I did write that the nfsds are *kernel* based processes, and used to run in th kernel's context. My suspicion is that they now (SunOS 4), either via 'bread()' or directly, do VM mapped IO to satisfy remote requests and thus require page table swaps, which causes problems on machines with few contexts. For sure they are reported having a lot of page-ins in SunOS 4, both by 'pstat -u' and 'ps axv', while they were reported to have a lot of IO transactions in SunOS 3. It's curious that processes that do not have an address space are doing page ins. Maybe some kind of address space they do have... :-) lm> In other words, this problem went away 3 years ago, never to return. Much software here is three years old... Same for a lot of people out there. Also, if Sun R&D corrected the mistake three years ago on their internal systems, it may take well over three years before it percolates to some machines in the field. There are quite a few people still running SunOS 3 out there (because SunOS 4.0, for this and other reasons, performed so poorly that they have preferred to stay with an older release, and are too scared to go on to SunOS 4.1.1 even if admittedly it is vastly improved). One amusing note though: one of the servers here has been recently put on 4.1, which is still not the latest and greatest, and it still shows appallingly high system time overheads directly proportional to NFS load, but with an important difference: the number of context switches per second reported by vmstat(1) is no longer appallingly high, even if it counts the nfs daemons in the runnable and blocked categories. What's going on? I have the suspicion that the number of context switches per second now simply excludes those for the nfsd processes. Final note: as usual, I want to remind everybody that I am essentially just a guest for News and Mail access at this site, and therefore none of my postings should reflect on the reputation of the research performed by the Coleg Prifysgol Cymru, in any way. I mention my observations of their systems solely because they are those at hand. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk