Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!oliveb!felix!zemon From: hedrick@topaz.rutgers.edu (Charles Hedrick) Newsgroups: comp.unix.ultrix Subject: Re: cc, ld, ar over NFS Message-ID: <9429@felix.UUCP> Date: Tue, 13-Oct-87 19:32:34 EDT Article-I.D.: felix.9429 Posted: Tue Oct 13 19:32:34 1987 Date-Received: Fri, 16-Oct-87 04:57:36 EDT References: <8597@felix.UUCP> Sender: zemon@felix.UUCP Organization: Rutgers Univ., New Brunswick, N.J. Lines: 48 Approved: zemon@felix.UUCP Reply-Path: > does anyone else have this problem? could this be related to an NFS > timeout problem? the compile completes OK on local disks but fails > on NFS-mounted disks, which would lead me think it is NFS-related. We have only used NFS on Suns and Pyramids so far, so I can't give experiences on Ultrix. But I can comment on whether your problem could be caused by timeouts. Assuming that NFS has been properly implemented, you can control the results of a timeout by whether the remote filesystem is mounted hard or soft. If it is mounted hard, a timeout simply causes the system to reset some parameters and try again. The program will not proceed until the data transfer has succeeded. So with hard mounts, it should (aside from bugs in NFS) be impossible for network problems to lead to corrupted data. With soft mounts, at some point NFS will give up and return an error to the program. If all programs were written ideally, the operation you were attempting would print some error message and terminate abnormally. Unfortunately, as we all know, there are Unix programs that do not bother to check for error returns from read and write. So it is quite possible that a program could proceed as if the write had succeeded, and you could end up with corrupted data. For this reason, all NFS documentation that I have seen cautions against the use of soft mounts. In the most recent Sun implementation there is a compromise, the "intr" option. This allows you to ^C out of a failing operation (eventually), but otherwise acts likes a normal hard mount (assuming you use "hard" and "intr"). The biggest problem with NFS is that there is no really nice way to make the "right" thing happen all the time. With the int option, if a server goes down, you can eventually get out of failing programs. But it may take several ^C's and a fair amount of waiting. "df" is particularly irritating. What you'd like is that when a system went down, somehow anything trying to use that file system would somehow magically get aborted in some unambiguous way, but that for transient failures, things would retry until they succeed. With the existing Unix and its utilities, this may be hard to do. There is one other way to get corrupted data from NFS: through undetected network problems. For performance reasons Sun disables the normal UDP checksumming for NFS packets. (One presumes that DEC has not changed this in Ultrix, though of course they could have.) They depend entirely upon the Ethernet packet checksums. This should be OK. But we once had a bad board in a gateway cause data going through that gateway to be corrupted. These errors were not detected, and we ended up with bad files. This certainly sounds scary, but in fact bad hardware can always corrupt your data. If the same error that happened in the gateway had happened to one of the end systems, then no checksumming would have been able to detect the problem.