Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.unix.internals
Subject: NFS vs communications meduim (was slashes, then NFS devices)
Message-ID: <11030@dog.ee.lbl.gov>
Date: 17 Mar 91 16:30:31 GMT
References: <1991Mar9.170841.4042@panix.uucp> <thurlow.669179279@convex.convex.com>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 61
X-Local-Date: Sun, 17 Mar 91 08:30:31 PST

In article <thurlow.669179279@convex.convex.com> thurlow@convex.com
(Robert Thurlow) writes:
>... The only difference is the performance bottleneck due to the network.
>If you crippled your I/O subsystem, you'd see similar things.  Until we
>get new networks that are two orders of magnitude faster, this may be
>the case.

(Rob T is at convex, so he may actually have disks with real bandwidth;
then the picture changes.)

The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2
MB/s.  The bandwidth of your standard, boring old SCSI disk without
synchronous mode is around 1.5 MB/s.  The latency on your Ethernet is
typically much *lower* than that on your standard boring SCSI
controller (which probably contains a 4 MHz 8085 running ill-planned
and poorly-written code, whereas your Ethernet chip has a higher clock
rate and shorter microcode paths.)

In other words, they are fairly closely matched.  So why does NFS
performance fall so far below local SCSI performance?

There are many different answers to this question, but one of the most
important is one of the easiest to cure.

A typical NFS implementation uses UDP to ship packets from one machine
to another.  Its UDP interface typically squeezes about 500 KB/s out of
the Ethernet (i.e., around 42% of the available bandwidth).  Since UDP
is an `unreliable protocol' (in the sense that UDP is allowed to drop
and reorder packets), the NFS implementation has to duplicate most of
the TCP mechanism to make things reliable.

A good TCP implementation, on the other hand, squeezes about 1.1 MB/s
out of the Ethernet even when talking to user code (talking to user code
is inherently at least slightly more expensive than talking to kernel
code, because you must double-check everything so that users cannot
crash the machine).  This is 92% of the available bandwidth.

Thus, one easy way to improve NFS performance (by a factor of less than
2, unfortunately: even though you may halve the time spent talking,
there is plenty of other unimproved time in there) is to replace the
poor TCP implementations with good ones, and then simply call the TCP
transport code.  (To talk to existing NFS implementations, you must
also preserve a UDP interface, so you might as well fix that too.)  The
reason this is easy is that much of the work has already been done for
you---it appears in the current BSD systems.  As a nice side bonus, TCP
NFS works over long-haul and low-speed networks (including 9600 baud
serial links).  A typical UDP NFS does not, because its retransmit
algorithms are wired for local Ethernet speeds.

Indeed, even if you do go from Ethernet to FDDI, you will find that your
NFS performance is largely unchanged unless you fix the UCP and TCP code.
(When you fix TCP, you will discover that you also need window scaling,
since the amount of data `in flight' over gigabit networks is much more
than an unscaled TCP window can describe.)

Opening up this bottleneck reveals the next one to be NFS's write-through
cache policy, and now I will stop talking.  (You may infer anything you
like from this phrasing :-) .)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov