Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!spool.mu.edu!uunet!stanford.edu!eos!data.nas.nasa.gov!sun418.nas.nasa.gov!truesdel From: truesdel@nas.nasa.gov (David A. Truesdell) Newsgroups: comp.unix.wizards Subject: Re: Another reason I hate NFS: Silent data loss! Message-ID: Date: 19 Jun 91 20:18:08 GMT References: <27226@adm.brl.mil> <16703.Jun1903.07.1091@kramden.acf.nyu.edu> Sender: news@nas.nasa.gov Organization: NAS Program, NASA Ames Research Center, Moffett Field, CA Lines: 25 brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes: >> NFS is designed as a reliable protocol. I have pounded more than 250 >> NFS requests/sec against a fileserver, and no data loss. >In this case the 20 requests came in under 1/50 of a second (somewhat >smaller, I think, but I don't have good measuring tools). I can't >sustain this load from one Sun, but a single burst was enough to lose >data. >> Things you >> should check are the number of retransmit's you authorized in /etc/fstab, >If the number of retransmits runs out, the writing process ``should'' >get an error. Otherwise the implementation is (obviously) buggy. Why ``should'' it? Your writes probably put their data into the buffer cache just fine, it's the subsequent flushing of the buffer cache that failed. And guess what? The write had probably already returned by then. Or, do you always use O_SYNC when opening files for writing? -- T.T.F.N., dave truesdell (truesdel@nas.nasa.gov) "Carpe Noctem"