Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!mit-eddie!genrad!decvax!decwrl!pyramid!oliveb!sun!fatkid!dss From: dss@fatkid.UUCP Newsgroups: comp.unix.wizards Subject: Re: Really, redundant file servers Message-ID: <12959@sun.uucp> Date: Mon, 9-Feb-87 17:48:01 EST Article-I.D.: sun.12959 Posted: Mon Feb 9 17:48:01 1987 Date-Received: Wed, 11-Feb-87 05:15:18 EST References: <2592@phri.UUCP> Sender: news@sun.uucp Reply-To: dss@sun.UUCP (Daniel Steinberg) Distribution: world Organization: Sun Microsystems, Mountain View Lines: 47 In article <2592@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > > How hard would it be to incorporate a "shadow server" mode into >NFS? Imagine 2 servers serving the same file system. When a write request >comes in, both servers do it and the client waits to hear an ack from both >of them. When a read request comes in, both servers try to do it and the >client takes the data from whichever server responds first. First off, you've made some assumptions about a single shadow.... think, instead, of a set of shadow 'devices' (or filesystems). Second, if you 'broadcast' read requests to all shadow servers, you're imposing a lot of distributed overhead for the no-error (99.9%) case. If, instead, you attempt the read first from a 'primary' source then you have to decide which filesystem is primary (it could rotate, but you'd probably lose any advantages that might be gained from buffered read-ahead). In many ways it is simpler to have a physical disk shadow than a filesystem shadow....there the intent is to provide some reasonable degree of redundancy in case of hard failure. Then, the issues are: Which disk do you read first? If you get an error writing to a shadow, was the 'write' successful? Are you concerned with data integrity between shadows (i.e., what do you do if one shadow has different data than then other?) How do you deal with bad-block mapping? Since disk shadowing is fundamentally intended to reduce the number of read errors (by providing redundancy), all the interesting decisions must be made when an error on one shadow occurs. When you shadow file- systems, you increase dramatically the number of types of errors that can occur. Consequently, the decision matrix gets far more complex. For instance, if you are shadowing filesystems, you cannot tolerate soft failures (e.g., timeout) on writing to any of the shadows. This is because a subsequent read to that shadow may succeed where the corresponding write failed. Of course, there's also naming conflict problems: what if creat() works on one filesystem but not another? what happens if one filesystem is used to shadow multiple clients? But even if you ignored those problems, there's still a whole can of worms involving failure recovery. For example, you raised the question of what happens if one shadow filesystem fills up before another. Consider, also, what happens when a read request returns end-of-file. Do you accept this or do you try all the shadows to see if one got further (and if it did, is it necessarily correct)? Daniel Steinberg (ihnp4|ucbvax)!sun!dss