Path: utzoo!mnetor!uunet!husc6!mailrus!tut.cis.ohio-state.edu!allosaur.cis.ohio-state.edu!bob
From: bob@allosaur.cis.ohio-state.edu (Bob Sutterfield)
Newsgroups: comp.dcom.lans
Subject: Re: NFS vs RFS (actually, vs Sprite and Andrew)
Message-ID: <8428@tut.cis.ohio-state.edu>
Date: 16 Mar 88 15:42:35 GMT
References: <10370@ut-sally.UUCP> <720@uel.uel.co.uk> <1695@uoregon.UUCP> <45660@sun.uucp>
Sender: news@tut.cis.ohio-state.edu
Organization: The Ohio State University Dept of Computer & Information Science
Lines: 65
Keywords: NFS, ANDREW, TOCS, consistency

In article <45660@sun.uucp> nowicki%rose@Sun.COM (Bill Nowicki) writes:
>In article <1695@uoregon.UUCP>, jqj@uoregon.UUCP (JQ Johnson) writes:
>> ...
>> Problems sited with NFS seem to fall into 2 categories:
>> 1/	scalability to large systems, because of excessive network
>> 	traffic, excessive server cpu loading, or difficulty in 
>> 	administration;
>
>Well of course the authors of systems are going to prefer their own.
>This is called the "Not Invented Here" syndrome.  As for scalability,
>I would estimate about 100,000 installed systems running NFS.
>Last I heard, about 500 run Andrew, and perhaps a few dozen run Sprite.

	Installed base != scalability.  We have encountered scaling
problems in existing implementations of NFS, and we worry about what
happens when we hit scaling problems in the design.

	We have about 130 Suns in our department so far (headed for at
least 250 by September), and each NFS-mounts nine filesystems from one
of our servers.  We only have one server with the horsepower for that:
a Pyramid 98x.  All our Sun-3/180s export only one user filesystem to
the rest of the world, besides taking care of their direct clients.

	In a steady-state network with the usual occasional client
bounce, all is well.  But think about when a few subnets-full of
clients - say 36 or so - are taken off the air while their ND
servers/IP gateways-to-the-backbone are dumped to tape.  When the
clients are brought back up and those mount requests hit the central
server, it's not pretty.

	nfsd has the horsepower, particularly when there are 8 of them
running.  inetd and portmap have no problems directing the traffic.
But each time rpc.mountd tries to service a mount request, it has to
sort through its in-core cache, then flush the state to /etc/rmtab.
By the time rpc.mountd has worked down its request queue, the older
ones have timed out.  Then those clients retry.  rpc.mountd thrashes
and chews up CPU time with abandon for a few days.  We have documented
these effects in well-controlled tests, if you're interested.

	We have found that killing rpc.mountd, mv'ing /dev/null to
/etc/rmtab, and starting a new rpc.mountd with -d will often get us
rolling again for a while, until the thrashing starts again in a few
minutes.  In an hour or two of babysitting, we can get all our
filesystems mounted on all those clients and life goes on.  The
problem seems to be the bottleneck of updating /etc/rmtab.  We would
remove that dependency (if we had our sources by now), which would
only break showmount(8).

	Note that this is neither a philosophical problem with NFS,
nor a Pyramid-specific bug: According to conversations at UNIForum
with a person from Sun's NFS portability group, all known
implementations of NFS have this feature.  I'm satisfied it's a
problem with the current implementations of the protocol, that can be
fixed by sufficient beating on vendors, or a few minutes in a quiet
room with the sources.

	But what design misfeatures might we encounter as we scale
even bigger, that aren't so easily solved?  Would someone from
Berkeley or CMU care to comment upon any specific scaling problems
{\em in the NFS design} that Sprite or Andrew would solve in a large
network of diskless workstations?
-=-
 Bob Sutterfield, Department of Computer and Information Science
 The Ohio State University; 2036 Neil Ave. Columbus OH USA 43210-1277
 bob@cis.ohio-state.edu or ...!cbosgd!osu-cis!bob