Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.unix.admin Subject: Re: Why idle backups?? Message-ID: <27337@mimsy.umd.edu> Date: 31 Oct 90 23:58:21 GMT References: <547@fciva.FRANKLIN.COM> <1642@sirius.ucs.adelaide.edu.au> <3212@ucsfcca.ucsf.edu> <32749@sparkyfs.istc.sri.com> <339@gallium.UUCP> <1990Oct24.210312.3271@cubmol.bio.columbia.edu> <1990Oct24.151840.2 Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 203 In article <32749@sparkyfs.istc.sri.com> zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky) answers the `subject' question. Five articles later... In koppenh@informatik.uni-stuttgart.de (Andreas Koppenhoefer), and in <339@gallium.UUCP> garyb@gallium.UUCP (Gary Blumenstein), ask for the Purdue mods mentioned. Equivalent mods are already included in recent versions of `dump' (as distributed by Berkeley since 4.3-tahoe if not earlier, and Sun since 4.0.3 if not earlier, and presumably DEC by 2001 if not earlier :-) ). The actual changes are: 1. Add a `dirdump' routine to dumptraverse; use this in dumpmain for pass III (directory dump) by changing pass(dump, dirmap); to pass(dirdump, dirmap); where dirdump(ip) simply calls dump(ip) if and only if (ip->di_mode & IFMT) == IFDIR. This prevents `restore' from seeing a regular file in the middle of the directory listing, which hopelessly confuses old versions of restore (and possibly new ones as well). Such things happen if a directory is deleted and its inode reused as a regular file before dump manages to reach it. (More on this below.) 2. Add code to dump() (also in dumptraverse.c) to skip a file if its mode (ip->di_mode) is 0, i.e., the inode is no longer in use. This happens whenever a file or directory is deleted and the inode is *not* reused. In <1990Oct24.210312.3271@cubmol.bio.columbia.edu> ping@cubmol.bio.columbia.edu (Shiping Zhang) asks how to put a complete backup onto no more than one tape. This is easily accomplished by buying an 8mm Exabyte drive, unless you have disks that hold more than 2 GB. (DAT drives will also work but hold less data, and the things cost more. New Exabyte hardware that stores over 4 GB per tape is now, or will soon be, available as well.) In <1990Oct24.151840.25570@ccad.uiowa.edu> emcguire@ccad.uiowa.edu (Ed McGuire) asks about validating a dump. This is difficult, as Elizabeth Zwicky describes in <32757@sparkyfs.istc.sri.com>: >1) Some individual file may be missing or damaged; without >attempting to restore that particular file, you will never know. It would not be difficult, although restore does not do this now, to write a program that compares the maps at the front with the inode special records to verify that all files exist on the tape. Files that were removed and not replaced, or directory files that were removed and were replaced with something other than another directory, will of course be `missing'. >2) Some individual file may be damaged so that any attempt to >read it confuses restore permanently. Any such thing points to a bug in restore. Restore should be (but perhaps is not) able to recover from such things. Naturally, such a damaged file will itself not be recoverable. These events [>1)] and [>2)] are most likely to happen when a file changes size while that file is being dumped. (Dump reads the inode, then the direct block contents, then the indirect blocks and their contents, all the while assuming that this data is valid.) This should merely cause the tape data to be invalid, and should not give restore fits. Note that restoring such a file could breach security: e.g., the sequence of events could be: A. dump discovers a 100 MB file B. dump begins dumping the file C. the file is truncated D. the blocks for that file are allocated to a new, high-security (mode 0600) file owned by someone else E. dump finishes dumping the file. The resulting tape holds up to 100 MB of high-security file contents attached to the original user id. When restored, the 100 MB file `reappears' but its contents differ from the original. >(Since restore doesn't tell you what it's trying to restore, only what >it has finished restoring, if you run into one of these when trying to >restore, you get to play binary search, doing add and extracts on >subsets of your original file list until you have everything but the >bad one. Ick.) Actually, you can run a `restore iv', add what you like, `extract', and note the name and/or inode number of the last file printed. Then run `restore t | sort -n' and look at the next higher inode number. This is the file that is causing restore to hang up. (`restore rv' will also work. Be sure to use a CRT so as not to waste paper.) >3) At some point, some file may be screwed enough to corrupt >everything after it ... Again, this should never happen (but probably can). In particular, this used to happen with the 4.2BSD dump/restore when the pass(dump, dirmap) wound up dumping a regular file (see `1.' near top of this article); this has been fixed. >4) There may be physical write or read errors on the tape. Good hardware will detect these while the tape is being written, though of course marginal defects may escape notice the first few times. In another article which I foolishly forgot to note, Dick Karpinski suggests that dump ought to be able to (slowly) produce a correct dump even when the file system is active, perhaps (my interpretation) by using some other algorithm. The answer to this is `no and yes': it could, but only by using a staging area at least as large as the final backup, and potentially unbounded time. The reason for this is simple, though the details are not. The tapes produced by dump are intended to be a complete snapshot of the state of the file system, but are ordered so that restores are not too difficult, without being ordered so strongly that dumps are slow. (Some may argue with the latter statement. :-) ) To this end, the contents of an infinitely long tape are: A. A `TS_TAPE' record naming dump time, level, etc. B. A bitmap of clear inodes (i.e., those that are not holding any file, of any kind). This is used to tell which files have been removed since the previous dump (so that `restore r' can put things back as they were). This is prefixed by a `TS_CLRI' record. C. A bitmap of set inodes (those that are holding files). This is prefixed by a `TS_BITS' record. D. All the directories needed to produce complete path names to all the files on the tape. These are a series of (TS_INODE,blocks,TS_ADDR, blocks,TS_ADDR,blocks,...) records, where each TS_INODE or TS_ADDR record contains enough information to tell how many `blocks' appear on the tape. (Holes in files result in non-written `blocks', i.e., a file consisting entirely of a hole has only TS_INODE and perhaps TS_ADDR records.) E. All the files being dumped (see item C above). F. A `TS_END' record. The boundary between directories and files is defined implicitly by the first non-directory on the tape. This is why the `dirdump' routine is so important for active dumps. Restore would have to be made much smarter to recover from `embedded' files in the directory area, and would still have to read the entire dump, not just the directory part, to be sure it got them all. If a dump requires more than one tape, each tape after the first begins with a TS_TAPE record followed by the same bitmap as in C above. (In theory this allows restore to `pick up' in the middle. In actuality, a data block which sufficiently resembles a TS_INODE record will fool a restore that is doing this. The 4.3-reno dump has a DR_NEWHEADER flag and new fields in the TS_TAPE record that tell how far restore has to go to get to a real TS_INODE record, which avoids this problem.) Dump decides which files (including directories) to dump by checking the inode times (atime, mtime, ctime, although the ctime alone should suffice). It reads a bunch of inodes from the raw disk device and pokes through them, reads another bunch, etc., until it has read them all. Each file that must be dumped sets a bit in the `files to dump' map. This is `pass I (regular files)'. Next dump scans through all the inodes again, this time checking to see if it needs to add any parent directories so as to reach the marked inodes. It loops doing this secondary scan until nothing more is marked. This is `pass II (directories)', and this is why pass II is usually run three or four times. (To make it run lots of times, mkdir a a/b a/b/c a/b/c/d a/b/c/d/e a/b/c/d/e/f a/b/c/d/e/f/g a/b/c/d/e/f/g/h and do a full backup, then touch a/b/c/d/e/f/g/h/i and do an incremental backup.) I added a hack, included in the latest BSD dump, that avoids pass II entirely if all directories are being dumped (this speeds up all level 0 dumps). (To make it pretty, it still claims to run pass II. You can tell that you have this version by the fact that `dump 0 ...' prints `pass I', runs for a while, then prints `pass II' and `pass III' without pausing in between.) If a file with several links changes, all directories leading to it are put on the tape. In pass III, dump actually writes all those directories it marked in passes I and II to the tape, and in pass IV, dump writes all the other files it marked (including devices and symlinks). In order to make a consistent backup, dump would have to: 1. Scan the disk for files to back up. 2. Write the backup to a staging area. 3. Use file-system calls (lstat()) to check up on everything written to the staging area. 4. For each file changed since part 1, replace its backup in the staging area, and add any new directories required. For each file deleted since part 1, effectively remove it from the staging area. Repeat from 3. until no files have changed or been removed. 5. Dump the staging area to the backup device. The date of this dump would be the time at which the final scan in step 3 (the one that found no changes) began. A much simpler method would be to freeze activity on the file system being dumped. A `freezefs' system call is being contemplated. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris