Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!mit-eddie!genrad!decvax!decwrl!pyramid!oliveb!sun!fatkid!dss
From: dss@fatkid.UUCP
Newsgroups: comp.unix.wizards
Subject: Re: Really, redundant file servers
Message-ID: <12959@sun.uucp>
Date: Mon, 9-Feb-87 17:48:01 EST
Article-I.D.: sun.12959
Posted: Mon Feb  9 17:48:01 1987
Date-Received: Wed, 11-Feb-87 05:15:18 EST
References: <2592@phri.UUCP>
Sender: news@sun.uucp
Reply-To: dss@sun.UUCP (Daniel Steinberg)
Distribution: world
Organization: Sun Microsystems, Mountain View
Lines: 47

In article <2592@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>
>	How hard would it be to incorporate a "shadow server" mode into
>NFS?  Imagine 2 servers serving the same file system.  When a write request
>comes in, both servers do it and the client waits to hear an ack from both
>of them.  When a read request comes in, both servers try to do it and the
>client takes the data from whichever server responds first.

First off, you've made some assumptions about a single shadow....
think, instead, of a set of shadow 'devices' (or filesystems).
Second, if you 'broadcast' read requests to all shadow servers, you're
imposing a lot of distributed overhead for the no-error (99.9%) case.
If, instead, you attempt the read first from a 'primary' source
then you have to decide which filesystem is primary (it could rotate,
but you'd probably lose any advantages that might be gained from
buffered read-ahead).

In many ways it is simpler to have a physical disk shadow than a
filesystem shadow....there the intent is to provide some reasonable
degree of redundancy in case of hard failure.  Then, the issues are:
    Which disk do you read first?
    If you get an error writing to a shadow, was the 'write' successful?
    Are you concerned with data integrity between shadows (i.e., what
    do you do if one shadow has different data than then other?)
    How do you deal with bad-block mapping?

Since disk shadowing is fundamentally intended to reduce the number of
read errors (by providing redundancy), all the interesting decisions
must be made when an error on one shadow occurs.  When you shadow file-
systems, you increase dramatically the number of types of errors that
can occur.  Consequently, the decision matrix gets far more complex.

For instance, if you are shadowing filesystems, you cannot tolerate
soft failures (e.g., timeout) on writing to any of the shadows.
This is because a subsequent read to that shadow may succeed where the
corresponding write failed.  Of course, there's also naming conflict
problems: what if creat() works on one filesystem but not another?
what happens if one filesystem is used to shadow multiple clients?
But even if you ignored those problems, there's still a whole can
of worms involving failure recovery.  For example, you raised the
question of what happens if one shadow filesystem fills up before
another.  Consider, also, what happens when a read request returns
end-of-file.  Do you accept this or do you try all the shadows to
see if one got further (and if it did, is it necessarily correct)?

Daniel Steinberg
(ihnp4|ucbvax)!sun!dss