Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!columbia!rutgers!clyde!cbatt!cbuxc!cbuxb!cbrma!karl From: karl@cbrma.UUCP (Karl Kleinpaste) Newsgroups: comp.unix.wizards Subject: Re: NFS [un]reliability Message-ID: <5377@cbrma.UUCP> Date: Sun, 9-Nov-86 21:37:40 EST Article-I.D.: cbrma.5377 Posted: Sun Nov 9 21:37:40 1986 Date-Received: Mon, 10-Nov-86 22:14:07 EST References: <1823@rlvd.UUCP> Organization: AT&T-BL, RMAS, Columbus Lines: 49 mike@louis.UUCP writes: >Recently we have been doing a study of NFS fileservers and we have >come across unreliability in NFS (i.e writing something to a remote >file and finding something different when reading it back) when the >server was under extreme load. Now we are starting to notice the same >behaviour on our existing Sun fileservers. > >The question is, have other noticed this and does anyone know why >it happens? [mumble] Yes, I've seen such a thing. At OSU, there is a small set of Suns (11?), 3 of which are Sun-2s and the rest are recently-purchased Sun-3s. Unfortunately, one of the Sun-2s is the server for *all* the rest. Some would call this a Bad Thing, and they would be right. It is equipped with 2 Eagle drives for a decent amount of disc, and all those other Suns are usually quite busy during office hours. This problem was first noticed in, of all things, the "hack" game, and more recently in GNU Emacs. GNU Emacs has lisp code to detect whether a file has changed on disc more recently than the last time the current user either read the file in or wrote his changes out. Periodically, when the server node is seriously overloaded (which is the case more and more often), GNU Emacs utters the evil phrase, "File has changed on disc; save anyway [y or n]?" It is *believed* (that is, we can't quite prove it yet) that this is due to the sequence of events where [a] Joe User saves his file, which causes additional work for an already-overloaded server, [b] GNU Emacs stat(2)'s the file to get its modification time, but [c] the server is so overloaded that the file wasn't finished being written at the time of the stat(2), so [d] Joe goes on and hacks at his file a while longer, [e] issues another save for it, at which time [f] GNU Emacs stat(2)'s the file again, compares it against its saved write-time, and [g] finds that the last modification time is later than the saved write-time. Potent words of evil tend to get uttered by Joe when he sees GNU Emacs' comment, because (generally speaking) he hasn't the FAINTEST idea what caused it. > And, of course, does anyone know how to stop it? OSU is choosing to solve the whole problem (that is, overall performance, not just GNU Emacs and similar programs' foolish comments) by replacing the Sun-2 file server with >1 Sun-3 file servers. You do what you have to. Unfortunately, it costs significant $$$ to do what you have to in such cases. -- Karl Kleinpaste