Path: utzoo!attcan!uunet!world!decwrl!sgi!vjs@rhyolite.wpd.sgi.com From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver) Newsgroups: comp.protocols.nfs Subject: Re: NFS writes and fsync(). Summary: more babble Message-ID: <72795@sgi.sgi.com> Date: 21 Oct 90 18:54:47 GMT References: <1990Oct9.152612@objy.objy.com> <72791@sgi.sgi.com> Sender: guest@sgi.sgi.com Organization: Silicon Graphics, Inc., Mountain View, CA Lines: 47 [one characteristic of news is that the long winded cannot see the audience muttering "shut up, already!" Sorry about this.] Some have written of paranoia about silently losing data to a server that crashes after returning NFS_OK and before flushing data to disk. That worry must be kept in perspective. We must quantify the probabilities of many failures, and act rationally. -until recently, many vendors, including almost all of those who now run servers synchronously, ran without UDP checksums. I know of mutual customers outraged by that, because they traced silent errors to missing UDP checksums. -many of us use VME or other busses, which do not have parity, and so will occassionally have undetected data corruption. -many systems have only byte-parity on RAM, so two cosmic rays in the same byte will cause silent corruption. The rest have ECC, which does not detect all soft errors. -using 1500 byte ethernet blocks, even with UDP checksums, increases the likelihood of undetected errors compared to using 64 byte blocks by > 30 times. Recall that one of the determinantes of the maximum FDDI block size was the probability of an undetected error given 500 stations and the limits on LER. We could subtantially improve the calculated reliability of NFS transmissions by using small blocks. Why doesn't the protocol require 64 byte blocks? Why is everyone using 4KB FDDI blocks, with the same old 32-bit FCS? -all systems have zillions of circuits that are known to have "metastable" or "resolver" failures--that is, we know hardware will sometimes decide your bit was 1 or 0 when it was really a 0 or 1. We all try to choose things so the MTBF of such a failure is low compared to any other. -in the average crash, the update deamon would have you lose changes of the last 15 seconds. The more modern bdflush comes closer to losing no work, since it tries to keep the disk continually busy flushing blocks. -the early 1980's decision to run without UDP checksums, but to run servers synchronously says volumes about the relative probabilites of server crashes and network corruption then. My recollections are that a server that stayed up for days was a wonder. The market requires and many of us deliver a different order of server reliability today. That Sun now runs with UDP checksums turned on says volumes about the low relative probability of Sun server failure today. -is it the sticky bit that Sun has used for the last 3 years to tell the server that a file should be written asynchronously? I remember hearing about it at Connectathon-before-last. Vernon Schryver, vjs@sgi.com