Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ucbvax!hoptoad!gnu From: gnu@hoptoad.UUCP Newsgroups: comp.unix.wizards Subject: Re: Backups on Live Systems Message-ID: <2248@hoptoad.uucp> Date: Thu, 4-Jun-87 06:13:36 EDT Article-I.D.: hoptoad.2248 Posted: Thu Jun 4 06:13:36 1987 Date-Received: Sat, 6-Jun-87 06:18:55 EDT References: <132@dvm.UUCP> <725@aramis.rutgers.edu> <20070@sun.uucp> Organization: Nebula Consultants in San Francisco Lines: 55 The first thing to do is make damn sure that there are *no* inputs that can crash the restore program. A program whose main use is recovery from catastrophe should be utterly reliable, or what use are all those dump tapes? I didn't believe it when I tried restoring from a fulldump plus an incdump, both done with the system single-user; restore screwed up! (SunOS 3.0) This was, and is, unthinkable to me, but I have a timesharing service bureau background (STSC), where we would change 30 disk packs for empty ones once a week and RESTORE onto the new ones, to make sure our tapes were good and because it also gave us a week-old full backup on disks in case we needed it. That data was not just valuable; it was our CUSTOMERS' data and our lifeblood. You just can't tell a customer paying for computer time and storage that you zapped her files and can she please recreate them. They go elsewhere. An APL timesharing system I used in Toronto developed a system for online backups; they would 'freeze' all access to files in each user's directory while dumping those files. (It was not a hierarchical file system.) Freezing file access on Unix could be done for particular directory trees, indicated to dump by a control file, or by the presence in the file-system-being-dumped of ".dumpfreeze" files or some such. E.g. each user's home directory could contain one, so that user would be likely to see a consistent dump. Such a freeze should probably hang new accesses (e.g. open or creat) while allowing read/write/close to continue for a few seconds, to give running programs a shot at getting files into a consistent state before the dump. Bill asked whether getting a consistent picture would have to be done through the file system rather than by reading the raw device. I think so. This need not be construed as a performance liability; probably adding a few well-chosen primitives to the file system could make dump actually run faster, since the kernel can presumably do *anything* faster than a user program if it wants to. This would reduce the number of programs that actually handle a raw file system down to 3 (mkfs, kernel and fsck). The chosen primitives should be general, e.g. should be usable by users doing their own dumps (no superuser required if you have access to all the files you are dumping) and could also be usable by other programs, e.g. 'find' or 'tar'. One primitive could take a pathname, a time, a buffer, and an integer, and search starting from the path for files whose inode timestamp is later than the time, putting their relative pathnames into the buffer, and optionally exclusive-opening up to N of them for the caller (putting the fd's into the buffer too). Files that could not be exclusively opened would be returned in the buffer with an fd of -1. I can think of holes in this (e.g. once one bufferload has been returned, how to restart the search? how to indicate whether directories being searched should remain locked against new links/creats? etc...), but clean filesystem interfaces for dumping is an area that would bear fruit, if only sour grapes :-|. -- Copyright 1987 John Gilmore; you may redistribute only if your recipients may. (This is an effort to bend Stargate to work with Usenet, not against it.) {sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu gnu@ingres.berkeley.edu