Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.unix.internals Subject: NFS vs communications meduim (was slashes, then NFS devices) Message-ID: <11030@dog.ee.lbl.gov> Date: 17 Mar 91 16:30:31 GMT References: <1991Mar9.170841.4042@panix.uucp> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 61 X-Local-Date: Sun, 17 Mar 91 08:30:31 PST In article thurlow@convex.com (Robert Thurlow) writes: >... The only difference is the performance bottleneck due to the network. >If you crippled your I/O subsystem, you'd see similar things. Until we >get new networks that are two orders of magnitude faster, this may be >the case. (Rob T is at convex, so he may actually have disks with real bandwidth; then the picture changes.) The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 MB/s. The bandwidth of your standard, boring old SCSI disk without synchronous mode is around 1.5 MB/s. The latency on your Ethernet is typically much *lower* than that on your standard boring SCSI controller (which probably contains a 4 MHz 8085 running ill-planned and poorly-written code, whereas your Ethernet chip has a higher clock rate and shorter microcode paths.) In other words, they are fairly closely matched. So why does NFS performance fall so far below local SCSI performance? There are many different answers to this question, but one of the most important is one of the easiest to cure. A typical NFS implementation uses UDP to ship packets from one machine to another. Its UDP interface typically squeezes about 500 KB/s out of the Ethernet (i.e., around 42% of the available bandwidth). Since UDP is an `unreliable protocol' (in the sense that UDP is allowed to drop and reorder packets), the NFS implementation has to duplicate most of the TCP mechanism to make things reliable. A good TCP implementation, on the other hand, squeezes about 1.1 MB/s out of the Ethernet even when talking to user code (talking to user code is inherently at least slightly more expensive than talking to kernel code, because you must double-check everything so that users cannot crash the machine). This is 92% of the available bandwidth. Thus, one easy way to improve NFS performance (by a factor of less than 2, unfortunately: even though you may halve the time spent talking, there is plenty of other unimproved time in there) is to replace the poor TCP implementations with good ones, and then simply call the TCP transport code. (To talk to existing NFS implementations, you must also preserve a UDP interface, so you might as well fix that too.) The reason this is easy is that much of the work has already been done for you---it appears in the current BSD systems. As a nice side bonus, TCP NFS works over long-haul and low-speed networks (including 9600 baud serial links). A typical UDP NFS does not, because its retransmit algorithms are wired for local Ethernet speeds. Indeed, even if you do go from Ethernet to FDDI, you will find that your NFS performance is largely unchanged unless you fix the UCP and TCP code. (When you fix TCP, you will discover that you also need window scaling, since the amount of data `in flight' over gigabit networks is much more than an unscaled TCP window can describe.) Opening up this bottleneck reveals the next one to be NFS's write-through cache policy, and now I will stop talking. (You may infer anything you like from this phrasing :-) .) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov