Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!oliveb!felix!zemon
From: hedrick@topaz.rutgers.edu (Charles Hedrick)
Newsgroups: comp.unix.ultrix
Subject: Re: cc, ld, ar over NFS
Message-ID: <9429@felix.UUCP>
Date: Tue, 13-Oct-87 19:32:34 EDT
Article-I.D.: felix.9429
Posted: Tue Oct 13 19:32:34 1987
Date-Received: Fri, 16-Oct-87 04:57:36 EDT
References: <8597@felix.UUCP>
Sender: zemon@felix.UUCP
Organization: Rutgers Univ., New Brunswick, N.J.
Lines: 48
Approved: zemon@felix.UUCP

Reply-Path:


> does anyone else have this problem?  could this be related to an NFS 
> timeout problem?  the compile completes OK on local disks but fails 
> on NFS-mounted disks, which would lead me think it is NFS-related.

We have only used NFS on Suns and Pyramids so far, so I can't give
experiences on Ultrix.  But I can comment on whether your problem
could be caused by timeouts.  Assuming that NFS has been properly
implemented, you can control the results of a timeout by whether the
remote filesystem is mounted hard or soft.  If it is mounted hard, a
timeout simply causes the system to reset some parameters and try
again.  The program will not proceed until the data transfer has
succeeded.  So with hard mounts, it should (aside from bugs in NFS) be
impossible for network problems to lead to corrupted data.  With soft
mounts, at some point NFS will give up and return an error to the
program.  If all programs were written ideally, the operation you were
attempting would print some error message and terminate abnormally.
Unfortunately, as we all know, there are Unix programs that do not
bother to check for error returns from read and write.  So it is quite
possible that a program could proceed as if the write had succeeded,
and you could end up with corrupted data.  For this reason, all NFS
documentation that I have seen cautions against the use of soft
mounts.  In the most recent Sun implementation there is a compromise,
the "intr" option.  This allows you to ^C out of a failing operation
(eventually), but otherwise acts likes a normal hard mount (assuming
you use "hard" and "intr").  

The biggest problem with NFS is that there is no really nice way to
make the "right" thing happen all the time.  With the int option, if a
server goes down, you can eventually get out of failing programs.  But
it may take several ^C's and a fair amount of waiting.  "df" is
particularly irritating.  What you'd like is that when a system went
down, somehow anything trying to use that file system would somehow
magically get aborted in some unambiguous way, but that for transient
failures, things would retry until they succeed.  With the existing
Unix and its utilities, this may be hard to do.

There is one other way to get corrupted data from NFS: through
undetected network problems.  For performance reasons Sun disables the
normal UDP checksumming for NFS packets.  (One presumes that DEC has
not changed this in Ultrix, though of course they could have.)  They
depend entirely upon the Ethernet packet checksums.  This should be
OK.  But we once had a bad board in a gateway cause data going through
that gateway to be corrupted.  These errors were not detected, and we
ended up with bad files.  This certainly sounds scary, but in fact bad
hardware can always corrupt your data.  If the same error that
happened in the gateway had happened to one of the end systems, then
no checksumming would have been able to detect the problem.