Xref: utzoo comp.protocols.nfs:1853 comp.arch:21152
Path: utzoo!utgpu!watserv1!watmath!att!linac!uwm.edu!spool.mu.edu!uunet!mcsun!ukc!edcastle!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.protocols.nfs,comp.arch
Subject: Re: how many nfsd's should I run?
Message-ID: <PCG.91Feb28195347@odin.cs.aber.ac.uk>
Date: 28 Feb 91 19:53:47 GMT
References: <28975@cs.yale.edu>
	<1991Feb22.012532.26075@murdoch.acc.Virginia.EDU> <4218@skye.ed.ac.uk>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 112
Nntp-Posting-Host: odin
In-reply-to: richard@aiai.ed.ac.uk's message of 22 Feb 91 16:14:12 GMT


I have crossposted to comp.arch, becasue this is really a system/network
architecture question. NFS is almost incidental :-).

On 22 Feb 91 16:14:12 GMT, richard@aiai.ed.ac.uk (Richard Tobin) said:

richard> In article <1991Feb22.012532.26075@murdoch.acc.Virginia.EDU>
richard> gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes:

gl8f> If you have too many processes competing for the limited slots in the
gl8f> hardware context cache, your machine will roll over and die. You can
gl8f> look up this number in you hardware manuals somewhere. For low-end
gl8f> sun4's the number is 8. I run 4 nfsd's on such machines. The same
gl8f> problem can bite you with too many biods.

richard> Given that nfsd runs in kernel mode inside nfssvc(), is this
richard> statement about contexts correct?

Yes and no, depending on who is your vendor, and which OS revision and
machine model you have. For Sun there is some history that may be worth
mentioning. Under SunOS 3 the nfsds were in effect kernel processes, so
that they could access the buffer cache, held in the kernel address
space, without copies. Since all nfsds run in the kernel page table
there was no problem.

Under SunOS 4 the buffer cache went away, so each nfsd was given its own
address space (memory mapped IO), while still being technically a kernel
process. This meant that MMU slot thrashing was virtually guaranteed, as
the nfs daemons are activated more or less FIFO and the MMU has a LIFO
replacement policy. As soon as the number of nfsds is greater or equal
to the number of MMU slots problems happen.

I have seen the same server running the same load under SunOS 3 the day
before with 10-20% system time and 100-200 context switches per second,
and with SunOS 4 the day after with 80-90% system time and 800-900
context switches per second. An MMU slot swap on a Sun 3 will take about
a millisecond, which fits.

Under SunOS 4.1.1 things may well be different, as Sun may have
corrected the problem (by making all the nfsds share a single adddress
space and giving each of them a section of it in which to map the
relevant files, for example, or by better tuning the MMU cache
replacement policy to the nfsd activation patterns, for another
example). On larger Sun 4s there are many more MMU slots, say 64, so the
problem effectively does not happen for any sensible number of nfsds.

richard> If so, why is the default number of nfsds for Sun 3s 8?

Sun bogosity :-).


As to the general problem of how many NFS daemons, I have already posted
long treatises on the subject. However briefly the argument is:


Each nfsd is synchronous, that is it may carry out only one operation at
a time, in a cycle: read request packet, find out what it means, go to
the IO subsystem to read/write the relevant block, write the result
packet, loop.

Clearly on a server that has X network interfaces, Y CPUs, and Z disks
(if your controller supports overlapping transfers, otherwise it is the
number of controllers) there cannot be more than X+Y+Z nfsds active, as
at most X nfsds can be reading or writing a packet from a network
interface, at most Y nfsds can be running kernel code, and at most Z
nfsds can be waiting for a a read or a write from a disk.

The optimum number may be lower that X+Y+Z, because it is damn unlikely
that the maximum multiprogramming level will actually be as high as
that, and there may other be processes that compete with nfsds for the
newtork interfaces, or the CPUs, or the disks.

It may also be higher, because this would allow multiple IO requests to
be queued waiting for a disk, thus giving the arm movement optimizer a
chance to work (if there is only ever one outstanding request per disk,
tis implies a de facto FCFS arm movement policy).

The latter argument is somewhat doubtful as there is contradictory
evidence about the relative merits of FCFS and of elevator style sorting
as used by the Unix kernel.

All in all I think that X+Y+Z is a reasonable estimate, or maybe a a
slightly larger number than that if you are persuaded that giving a
chance to the disk request sorter is worthwhile (which may not be true
for a remote file server, as opposite to a timesharing system where it
is almost always worthwhile).

Naturally this is only the "benefit" side of the equation. As to the
"cost" side, it used to be that nfsds had a very low cost (a proc table
slot each and little more), so slightly overallocating them was not a
big problem.  But on some OS/machine combinations the cost becomes very
large over a certain threshold, and this may mean that reducing the
number below the theoretical maximum pays off.

Finally there is question of the Ethernet bandwidth. In the best of
cases an Ethernet interface can process read about 1000 packets/s, and
write 800KB/s (we assume that requests are small, so the number of
packets/s matters, while results are large, so the number of KB/s
matters; stat(2) and read/exec(2) are far more common than write(2)).

Divide that by the number of clients that may be actively requesting
data (usually about a tenth of the total number of machines on a wire
are actively doing remote IO), and you get pretty depressing numbers.

It may be pointless to have say 4 2MB/s server disks capable of doing
each 50 transactions per second each involving say 8-16KB and so have
enough nfsds to take advantage of this parallelism and bandwidth, if the
Ethernet wire and interface are the bottleneck.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk