Xref: utzoo comp.protocols.nfs:1853 comp.arch:21152 Path: utzoo!utgpu!watserv1!watmath!att!linac!uwm.edu!spool.mu.edu!uunet!mcsun!ukc!edcastle!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.protocols.nfs,comp.arch Subject: Re: how many nfsd's should I run? Message-ID: Date: 28 Feb 91 19:53:47 GMT References: <28975@cs.yale.edu> <1991Feb22.012532.26075@murdoch.acc.Virginia.EDU> <4218@skye.ed.ac.uk> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 112 Nntp-Posting-Host: odin In-reply-to: richard@aiai.ed.ac.uk's message of 22 Feb 91 16:14:12 GMT I have crossposted to comp.arch, becasue this is really a system/network architecture question. NFS is almost incidental :-). On 22 Feb 91 16:14:12 GMT, richard@aiai.ed.ac.uk (Richard Tobin) said: richard> In article <1991Feb22.012532.26075@murdoch.acc.Virginia.EDU> richard> gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes: gl8f> If you have too many processes competing for the limited slots in the gl8f> hardware context cache, your machine will roll over and die. You can gl8f> look up this number in you hardware manuals somewhere. For low-end gl8f> sun4's the number is 8. I run 4 nfsd's on such machines. The same gl8f> problem can bite you with too many biods. richard> Given that nfsd runs in kernel mode inside nfssvc(), is this richard> statement about contexts correct? Yes and no, depending on who is your vendor, and which OS revision and machine model you have. For Sun there is some history that may be worth mentioning. Under SunOS 3 the nfsds were in effect kernel processes, so that they could access the buffer cache, held in the kernel address space, without copies. Since all nfsds run in the kernel page table there was no problem. Under SunOS 4 the buffer cache went away, so each nfsd was given its own address space (memory mapped IO), while still being technically a kernel process. This meant that MMU slot thrashing was virtually guaranteed, as the nfs daemons are activated more or less FIFO and the MMU has a LIFO replacement policy. As soon as the number of nfsds is greater or equal to the number of MMU slots problems happen. I have seen the same server running the same load under SunOS 3 the day before with 10-20% system time and 100-200 context switches per second, and with SunOS 4 the day after with 80-90% system time and 800-900 context switches per second. An MMU slot swap on a Sun 3 will take about a millisecond, which fits. Under SunOS 4.1.1 things may well be different, as Sun may have corrected the problem (by making all the nfsds share a single adddress space and giving each of them a section of it in which to map the relevant files, for example, or by better tuning the MMU cache replacement policy to the nfsd activation patterns, for another example). On larger Sun 4s there are many more MMU slots, say 64, so the problem effectively does not happen for any sensible number of nfsds. richard> If so, why is the default number of nfsds for Sun 3s 8? Sun bogosity :-). As to the general problem of how many NFS daemons, I have already posted long treatises on the subject. However briefly the argument is: Each nfsd is synchronous, that is it may carry out only one operation at a time, in a cycle: read request packet, find out what it means, go to the IO subsystem to read/write the relevant block, write the result packet, loop. Clearly on a server that has X network interfaces, Y CPUs, and Z disks (if your controller supports overlapping transfers, otherwise it is the number of controllers) there cannot be more than X+Y+Z nfsds active, as at most X nfsds can be reading or writing a packet from a network interface, at most Y nfsds can be running kernel code, and at most Z nfsds can be waiting for a a read or a write from a disk. The optimum number may be lower that X+Y+Z, because it is damn unlikely that the maximum multiprogramming level will actually be as high as that, and there may other be processes that compete with nfsds for the newtork interfaces, or the CPUs, or the disks. It may also be higher, because this would allow multiple IO requests to be queued waiting for a disk, thus giving the arm movement optimizer a chance to work (if there is only ever one outstanding request per disk, tis implies a de facto FCFS arm movement policy). The latter argument is somewhat doubtful as there is contradictory evidence about the relative merits of FCFS and of elevator style sorting as used by the Unix kernel. All in all I think that X+Y+Z is a reasonable estimate, or maybe a a slightly larger number than that if you are persuaded that giving a chance to the disk request sorter is worthwhile (which may not be true for a remote file server, as opposite to a timesharing system where it is almost always worthwhile). Naturally this is only the "benefit" side of the equation. As to the "cost" side, it used to be that nfsds had a very low cost (a proc table slot each and little more), so slightly overallocating them was not a big problem. But on some OS/machine combinations the cost becomes very large over a certain threshold, and this may mean that reducing the number below the theoretical maximum pays off. Finally there is question of the Ethernet bandwidth. In the best of cases an Ethernet interface can process read about 1000 packets/s, and write 800KB/s (we assume that requests are small, so the number of packets/s matters, while results are large, so the number of KB/s matters; stat(2) and read/exec(2) are far more common than write(2)). Divide that by the number of clients that may be actively requesting data (usually about a tenth of the total number of machines on a wire are actively doing remote IO), and you get pretty depressing numbers. It may be pointless to have say 4 2MB/s server disks capable of doing each 50 transactions per second each involving say 8-16KB and so have enough nfsds to take advantage of this parallelism and bandwidth, if the Ethernet wire and interface are the bottleneck. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk