Xref: utzoo comp.protocols.nfs:1870 comp.arch:21187
Path: utzoo!utgpu!watserv1!watmath!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!samsung!olivea!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi)
Newsgroups: comp.protocols.nfs,comp.arch
Subject: Re: how many nfsd's should I run?
Message-ID: <PCG.91Mar4210214@aberdb.cs.aber.ac.uk>
Date: 4 Mar 91 21:02:14 GMT
References: <28975@cs.yale.edu> <PCG.91Feb28195347@odin.cs.aber.ac.uk>
	<476@appserv.Eng.Sun.COM>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 112
Nntp-Posting-Host: aberdb
In-reply-to: lm@slovax.Berkeley.EDU's message of 1 Mar 91 21:37:30 GMT


[ this article may have already appeared; I repost it because probably
it did not get out of the local machine; apologies if you see it more
than once ]

	[ ... on SUN NFS/MMU sys time bogosity ... ]

pcg> I have seen the same server running the same load under SunOS 3 the
pcg> day before with 10-20% system time and 100-200 context switches per
pcg> second, and with SunOS 4 the day after with 80-90% system time and
pcg> 800-900 context switches per second. An MMU slot swap on a Sun 3
pcg> will take about a millisecond, which fits.

On 1 Mar 91 21:37:30 GMT, Larry McVoy commented:

lm> You may well have seen this.  Jumping to the conclusing that it is
lm> caused by NFS is false, at least the reasons that you list are not
lm> true.

This you say after recognizing above that the problem existed and
claiming that in recent SunOS releases it has been obviated. Now you
seem to hint that it is NFS related, buit not because of MMU context
switching.

As to me, my educated guesses are: this bogosity appears to be strictly
correlated with the number of NFS transactions processed per second, and
the overhead per transaction seems to be about 1ms and that 1ms seems to
be the cost of a MMU swap, and the number of context switches per second
reported by vmstat(8) seems to be correlated strongly to the number of
active nfsd processed, and the system time accumulated by nfsd processes
becomes very large when there are many context switches per second, but
not otherwise.

Anybody with this problem (it helps to have servers running both SunOS
3.5 and SunOS 4.0.x) can have a look at the evidence, thanks to the
wonders of nfsstat(8), vmstat(1), ps(1) and pstat(8). In particular
'vmstat 1' (the 'r' 'b' 'cs' 'sy' columns), 'ps axv' (the 'TIM" and
'PAGEIN' columns) will be revealing; 'nfstat -ns' and 'pstat -u <nfsd
pid>' will give extra details (sample outputs for both SunOS 3 and 4
available on request).

The inferences that can be drawn are obvious, even if maybe wrong. After
all I don't spend too much time second guessing the *whys* of Sun
bogosities, contrary to appearances. I am already overwhelmed by those
in AT&T Sv386 at home...  :-).

Pray, tell us why the above observed behaviour is not a bogosity, or at
least what was/is the cause, and how/if it has been obviated three years
ago.  My explanation is a best guess, as should be pretty obvious; you
need not guess, and I am sure that enquiring minds want to know.


As to the details:

lm> nfs_svc()

lm>        /* Now, release client memory; we never return back to user */
lm>        relvm(u.u_procp);

lm> From the SCCS history (note the date):

The date is when the file was edited on a machine at Sun R&D... This is
slightly cheating. When did the majority of customers see this? 

lm>	D 2.83 87/12/15 18:34:42 kepecs 88 87
lm>	remove virtual memory in async_daemon and nfs_svc as
lm>	it's not needed. Remove pre-2.0 code to set rdir in nfs_svc.
lm>	make sure these guys exit if error, since no vm to return to.

Note that this was well known to me, and I did write that the nfsds are
*kernel* based processes, and used to run in th kernel's context.

My suspicion is that they now (SunOS 4), either via 'bread()' or
directly, do VM mapped IO to satisfy remote requests and thus require
page table swaps, which causes problems on machines with few contexts.
For sure they are reported having a lot of page-ins in SunOS 4, both by
'pstat -u' and 'ps axv', while they were reported to have a lot of IO
transactions in SunOS 3. It's curious that processes that do not have an
address space are doing page ins. Maybe some kind of address space they
do have... :-)

lm> In other words, this problem went away 3 years ago, never to return.

Much software here is three years old... Same for a lot of people out
there. Also, if Sun R&D corrected the mistake three years ago on their
internal systems, it may take well over three years before it percolates
to some machines in the field.

There are quite a few people still running SunOS 3 out there (because
SunOS 4.0, for this and other reasons, performed so poorly that they
have preferred to stay with an older release, and are too scared to go
on to SunOS 4.1.1 even if admittedly it is vastly improved).

One amusing note though: one of the servers here has been recently put
on 4.1, which is still not the latest and greatest, and it still shows
appallingly high system time overheads directly proportional to NFS
load, but with an important difference: the number of context switches
per second reported by vmstat(1) is no longer appallingly high, even if
it counts the nfs daemons in the runnable and blocked categories. What's
going on?  I have the suspicion that the number of context switches per
second now simply excludes those for the nfsd processes.


Final note: as usual, I want to remind everybody that I am essentially
just a guest for News and Mail access at this site, and therefore none
of my postings should reflect on the reputation of the research
performed by the Coleg Prifysgol Cymru, in any way. I mention my
observations of their systems solely because they are those at hand.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk