Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!spool.mu.edu!uunet!stanford.edu!eos!data.nas.nasa.gov!sun418.nas.nasa.gov!truesdel
From: truesdel@nas.nasa.gov (David A. Truesdell)
Newsgroups: comp.unix.wizards
Subject: Re: Another reason I hate NFS: Silent data loss!
Message-ID: <truesdel.677362688@sun418>
Date: 19 Jun 91 20:18:08 GMT
References: <27226@adm.brl.mil> <16703.Jun1903.07.1091@kramden.acf.nyu.edu>
Sender: news@nas.nasa.gov
Organization: NAS Program, NASA Ames Research Center, Moffett Field, CA
Lines: 25

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes:
>> NFS is designed as a reliable protocol.  I have pounded more than 250
>> NFS requests/sec against a fileserver, and no data loss.

>In this case the 20 requests came in under 1/50 of a second (somewhat
>smaller, I think, but I don't have good measuring tools). I can't
>sustain this load from one Sun, but a single burst was enough to lose
>data.

>> Things you
>> should check are the number of retransmit's you authorized in /etc/fstab,

>If the number of retransmits runs out, the writing process ``should''
>get an error. Otherwise the implementation is (obviously) buggy.

Why ``should'' it?  Your writes probably put their data into the buffer cache
just fine, it's the subsequent flushing of the buffer cache that failed.  And
guess what?  The write had probably already returned by then.  Or, do you
always use O_SYNC when opening files for writing?
--
T.T.F.N.,
dave truesdell (truesdel@nas.nasa.gov)
"Carpe Noctem"