Path: utzoo!attcan!uunet!world!decwrl!sgi!vjs@rhyolite.wpd.sgi.com
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
Newsgroups: comp.protocols.nfs
Subject: Re: NFS writes and fsync().
Summary: more babble
Message-ID: <72795@sgi.sgi.com>
Date: 21 Oct 90 18:54:47 GMT
References: <1990Oct9.152612@objy.objy.com> <thurlow.655748135@convex.convex.com> <72791@sgi.sgi.com>
Sender: guest@sgi.sgi.com
Organization: Silicon Graphics, Inc., Mountain View, CA
Lines: 47


[one characteristic of news is that the long winded cannot see the audience
muttering "shut up, already!"   Sorry about this.]

Some have written of paranoia about silently losing data to a server that
crashes after returning NFS_OK and before flushing data to disk.  That
worry must be kept in perspective.  We must quantify the probabilities of
many failures, and act rationally.
 -until recently, many vendors, including almost all of those who now run
    servers synchronously, ran without UDP checksums.  I know of mutual
    customers outraged by that, because they traced silent errors to missing
    UDP checksums.
 -many of us use VME or other busses, which do not have parity, and so
    will occassionally have undetected data corruption.
 -many systems have only byte-parity on RAM, so two cosmic rays in the
    same byte will cause silent corruption.  The rest have ECC, which does
    not detect all soft errors.
 -using 1500 byte ethernet blocks, even with UDP checksums, increases the
    likelihood of undetected errors compared to using 64 byte blocks by >
    30 times.  Recall that one of the determinantes of the maximum FDDI
    block size was the probability of an undetected error given 500
    stations and the limits on LER.  We could subtantially improve the
    calculated reliability of NFS transmissions by using small blocks.  Why
    doesn't the protocol require 64 byte blocks?  Why is everyone using 4KB
    FDDI blocks, with the same old 32-bit FCS?
 -all systems have zillions of circuits that are known to have "metastable"
    or "resolver" failures--that is, we know hardware will sometimes decide
    your bit was 1 or 0 when it was really a 0 or 1.  We all try to choose
    things so the MTBF of such a failure is low compared to any other.


 -in the average crash, the update deamon would have you lose changes of the
    last 15 seconds.  The more modern bdflush comes closer to losing no work,
    since it tries to keep the disk continually busy flushing blocks.
 -the early 1980's decision to run without UDP checksums, but to run
    servers synchronously says volumes about the relative probabilites
    of server crashes and network corruption then.  My recollections are that
    a server that stayed up for days was a wonder.  The market requires
    and many of us deliver a different order of server reliability today.
    That Sun now runs with UDP checksums turned on says volumes about
    the low relative probability of Sun server failure today.
 -is it the sticky bit that Sun has used for the last 3 years to tell
    the server that a file should be written asynchronously?  I remember
    hearing about it at Connectathon-before-last.


Vernon Schryver,    vjs@sgi.com