Path: utzoo!attcan!uunet!cs.utexas.edu!rice!sun-spots-request From: pcg@compsci.aberystwyth.ac.uk (Piercarlo Grandi) Newsgroups: comp.sys.sun Subject: Re: number of nfsd processes to start (L Keywords: Networks Message-ID: <4255@brazos.Rice.edu> Date: 9 Jan 90 21:22:56 GMT Sender: root@rice.edu Organization: Sun-Spots Lines: 211 Approved: Sun-Spots@rice.edu X-Refs: Original: v9n2 X-Sun-Spots-Digest: Volume 9, Issue 6, message 9 of 9 In article <4116@brazos.Rice.edu> monty@delphi.bsd.uchicago.edu (Monty Mullig) writes: The man page for nfsd suggest that "four is a good number" of nfsd process to start, but doesn't give any further information on how to choose the best number to start. The answer will be long... It also discusses how to improve performance in general. In blue sky theory, the number of nfsd processes should be equal to the number of discs you have plus two; this to cover the possibility where all the discs are busy, one nfsd is working the ethernet (if you have more than one ethernet interface more than one nfsd could be reading or writing to the ethernet) and one nfsd is running the CPU as well. In practice, this may well be too optimistic, because the number of concurrently busy nfsds is liable to virtually never reach its theoretical maximum. If you have fewer nfsd processes, specially under SunOS 4, the number of context switches will go down dramatically, especially if your server is used mostly for NFS service. Another way to get context switches down is conceivably to tune a bit the NFS request sizes, but here I'd like to hear somebody else's experiences. Remember also that nfsds are activated FIFO, not LIFO, and this means that they will *all* be active, and thus each of them will, even with low traffic, consume one of the precious SUN MMU slots (one of the reasons why nfsd should be multithreaded, not multiforked). We have 10 diskless clients and about 20 PCs running off of a 4/280 and our diskless 3/50s have recently begun to run noticably slower (we just added 4). This is quite a load. Especially on the ethernet interface. Tree like communication patterns on Ethernet are known *bad*. Should we increase the nfsd processes? Definitely *not*. You don't add any processing power by adding new daemons, you only add opportunity for interleaving, and this cannot go higher than the number of devices that can be busy at one time. On the contrary, context switching is likely to become more frequent, and remember also that the page context cache on a SUN has a fixed number of slots, and having more active processes than the number of slots (tipically 8 or 16, the 4/280 may have 32) is *A BAD IDEA*. our 4/280 has one controller on each drive, with the client swap partitions on a rimfire controller and a hitachi 892 MB drive. we have 32 MB on the 4/280. What you should do is increase the buffer cache size on the clients (say to 20-25% of memory), by patching in a suitable way 'bufpages' (number of 8 kbyte buffer slots) and 'nbuf' (number of buffer headers, should be about no smaller than four times the value of 'bufpages') increase it on the server (say to 40-50% of memory), and balance the swap and root partitions across the two drives. you may want to do the following: # 4 Meg (512 pages) Sun3/50 workstation adb -w /vmunix <<@ bufpages?W 0t100 # 20-25%, cache locally nbuf?W 0t600 # small dirs/files cached @ # 32 Meg (4096 pages) Sun4/280 file server adb -w /vmunix <<@ bufpages?W 0t1200 # 30-35%, pointless more than 10 megs nbuf?W 0t5000 # probably larger files cached @ You may want to distribute your filesystems as follows: drive 1: root (including /var, /private) plus /usr (/share, ...) drive 2: swap plus /users You often want to have half of the client's roots and swaps on the first disc, and half on the second disc, and viceversa. It may also be a good (but probably marginally so) idea to duplicate the read-only filesystems on both drives, and have half of the clients use one copy and half the other copy. For example, assuming your clients are in sets A and B: drive 1: server root+/private set A roots set A shared (used also by server) set B swaps set B homes drive 2: server swap set B roots set B shared (a copy of set A's) set A swaps set A homes Some extra care can be taken for example to ensure that the high traffic filesystems, be them usually the shared binaries and libraries, or the user filesystems, be in the middle of the discs, to minimize expected arm motion (the layout above reflects this). Each user should be notionally assigned a most frequently used workstation, and the home directory for the user reckoned to be in the same set as the workstation. You should *very* seriously consider taking both discs off your 4/280 and putting them on a smaller machine each (each with a dumping device, e.g. an exabyte). Unless you configure a lot of nfsd processed that chew up context switch time, an NFS server is strongly io bound, both in ethernet board and disc bandwidth; CPU speed and main memory size almost don't matter. By having two discs on two different machines you are putting each disc behind its own ethernet interface, you have a two-rooted communication pattern, and you make each disc served by a whole machine. If you follow my ideas above and split the load between the two discs as I have suggested, you will be able to gain additional performance, as for example any program that copies from user space to /tmp and back will involve *three* machines in parallel, with the potential for significant overlapping. You also free your expensive, fast 4/280 as diskless compute server, and you could use the Purdue system that automagically selects the least loaded machine for executing commands, or statically replace known piggy (in either memory or time) applications (e.g. troff, nroff, lisp) with scripts like 'exec rsh sun4 troff "$@"'... If you do this, the buffer cache allocations (always assuming you are still running SunOS 3) could be: # 4 Meg (512 pages) Sun3/50 workstation adb -w /vmunix <<@ bufpages?W 0t90 # 20-25%, cache locally nbuf?W 0t540 # small dirs/files cached @ # 4 Meg (512 pages) Sun3/50 file server adb -w /vmunix <<@ bufpages?W 0t200 # 50%, what else use memory for? nbuf?W 0t1000 # probably larger files cached @ # 32 Meg (4096 pages) Sun4/280 compute server adb -w /vmunix <<@ bufpages?W 0t600 # 15%-20%, CPU/memory bound jobs nbuf?W 0t3000 # probably larger files cached @ You should consider adding a local swap+/tmp disc (in your case I would suggest something like 300 MBytes, two thirds to swap, a third to /tmp -- you don't want to support address spaces up to more than say 6-7 times your main memory, because than trashing is virtually guaranteed) compute server, so that it may be used especially efficiently for programs that require large address spaces, or have large intermediate files, so that you can save disc space by having smaller than otherwise necessary swap (for example, a 4 Meg Sun 3/50 might well do with just 8 Megs swap, depending on which windowing system you use, etc...) and private allocations to the workstations. Adding a local, cheap, small (say 40 megs) disc to your workstations for swap and /tmp and /private|/var, can result is considerable reductions in ethernet traffic and load on the servers. It is useless to have fast discs on the servers if the path to them is busy, and/or choked by the ethernet boards. Sharing on a remote disc home directories and executables and libraries makes sense as to saving space, simplifying dumping, simplifying administration. Having multiple workstations instead of terminals can offload high overhead user interactions or processes (edits, small compiles), but that require good response time, from the expensive compute servers (on many a PDP or VAX, when vi was introduced, or uucp was running, interrupt overheads killed the CPU, and we don't want to reproduce this again...). Probably also does not cost much in performance (may also help if the server discs are fast, as most large discs are, and cost per byte is chaper than multiple small discs, as long as the path to them is not choked). Both user files and shared utilities have expected good locality: users don't often edit or compile hundreds of different large files in a session) and/or caching profile, executables can be made sticky and fetched repeatedly from the swap partition, because users don't often use hundreds of different commands in rapid succession in a session, all shared material is read only, like libraries, etc... On the contrary, swap and temporary or spool files are guaranteed to be bad for being locally cached, either because they aren't (swap is not cached), or because they are guaranteed (most temporaries or spool files) written as often as read, and NFS is essentially write-thru, assuming that reads are much more frequent than writes. Finally, check out with monitoring sw all of the above. Using netstat, iostat, vmstat will give you a lot insight. Use also tcpdump, and other ethernet traffic analyzers. Summarize accounting data and look at user's command usage patterns. Know what type of work they are doing, and nudge them towards using the compute servers etc.. if appropriate. In summary: doing a good configuration requires MY PRECIOUS AND UNIQUELY DEEP ADVICE, as revealed to the gasping masses in this article [;-) ;-) ;-)], and/or knowledge of OS design principles (e.g. implications of the sicky bit) and performance characterization (e.g. expected cache hit rates for various types of files), and willigness to do analysis and monitoring (essential, because guidelines such as mine must be adapted). Piercarlo "Peter" Grandi ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk UUCP: ...!mcvax!ukc!aber-cs!pcg INET: pcg@cs.aber.ac.uk