Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!ucbvax!UTAH-20.ARPA!ZELEZNIK From: ZELEZNIK@UTAH-20.ARPA.UUCP Newsgroups: mod.computers.apollo Subject: Alternate Links Message-ID: <12221858028.8.ZELEZNIK@UTAH-20.ARPA> Date: Fri, 11-Jul-86 12:31:18 EDT Article-I.D.: UTAH-20.12221858028.8.ZELEZNIK Posted: Fri Jul 11 12:31:18 1986 Date-Received: Sat, 12-Jul-86 01:12:40 EDT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 39 Approved: apollo@yale-comix.arpa In reference to alternate links, we have developed a two fold approach to maintaining a consistent environment (at the system, node, and user levels) across single node failures. First, each entry directory with "critical" data has a backup location on some other node. For system level directories, each backup contains only those branches which are required to maintain the environment (e.g., /etc, /sys/tcp, ... objects which exist in only one place). For user login directories, the backup is user specified, (usually top level files and links, user_data, and personal com/bin directories). In this way, both the system and user environments are preserved. A simple "node_down node" command walks the net and uncatalogs the unavailable node, replacing it with a link to its backup location, while a corresponding "node_up node" command undoes this action (execution of these commands is restricted). In this way, we forget about what lives where; once the backup locations are established, everything is handled by simply switching the root level entry for the unavailable node. Second, backup node_data directories are provided to preserve the node environment when diskless partners are down. Each node has a primary and a secondary paging partner. The secondary is maintained with only the critical files (e.g., startup?*, etc?*, tcp info, ... and any user-specific files), with all non-essential files removed periodically. This reduces the backup size by more than an order of magnitude. All the necessary syncing and such for all of this is done automatically through scripts running under /etc/cron. User logins, however, are left to the user. While far from the elegance of replicated objects, this has provided a reasonably stable environment during single node failures, and has been straight forward to maintain. Contact me if anyone has more interest. Mike Zeleznik University of Utah Zeleznik@Utah-20.ARPA -------