Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!cmcl2!phri!delftcc!sam From: sam@delftcc.UUCP (Sam Kendall) Newsgroups: net.unix-wizards,net.bugs.usg Subject: Some thoughts on enhancing cpio(1) Message-ID: <135@delftcc.UUCP> Date: Wed, 2-Apr-86 15:29:50 EST Article-I.D.: delftcc.135 Posted: Wed Apr 2 15:29:50 1986 Date-Received: Sat, 5-Apr-86 10:52:52 EST Organization: Delft Consulting Corp., New York Lines: 98 Xref: watmath net.unix-wizards:17456 net.bugs.usg:465 I've had some thoughts recently about features that cpio(1) needs. Some of these apply to tar(1) also. (1) Optional error recovery. If the header of just one file in a cpio archive is munged, cpio will issue the pitiful message "Out of phase--get help" and terminate. This message is confusing to ordinary users, and it then takes a guru to recover the files in the archive past the garbled point. This is a bit ridiculous. There should be some optional error recovery, like the ability to retrieve the file following the garbled header (even if its name is unknown), and then to recognize the next file header in the garbled archive and proceed from there. This might break down if another cpio archive were one of the files in the garbled archive, but no big deal. (2) Automatic recognition of -c vs. non-"-c" formats. The -c option could be ignored with -i (copy in); cpio should recognize which format the archive is in. This is easy to implement. It complicates error recovery, though, in the case that the beginning of the file is munged. (3) Fix the bug that -m (restore file modification times) is ineffective on directories that are being copied. This is vital for the next feature: (4) Optional save and restore of directory contents, with file deletion. The purpose of this feature is to correctly handle full and incremental backups with cpio; specifically, to correctly restore a directory in which files have been removed after the full backup was made, but before the incremental backup was made. Currently, when -o (copy out) gets the name of a directory, it outputs a header for that directory, but no contents. My proposal is for an option "-D" which would work with both -o and -i. With -o, a list of files in a directory is saved along with the directory. With -i, when a directory is being restored and is "replacing" an already existing directory on disk, all files that are in the existing directory but NOT in the archived directory are REMOVED. Another way to look at it: with a cpio -i, the action of a file replacing an already existing file means, of course, that the archived contents replace the contents on disk. But there is no corresponding action for directories. -D adds such an action. N.B.: as with files, the archived directory will replace the existing directory only if it is newer or the -u option is given; this is why (3) above is necessary. -D would also work with -p (pass), of course. Example: a directory "d" contains files "a" and "b". A full backup (using cpio) is made including "d" and its contents. The file "b" is deleted. Now an incremental backup of files that have changed since the full backup is made using cpio -D. "d" is on the incremental backup, because it has changed since the full backup was made. (It changed when "b" was deleted.) Now suppose "d" is lost on disk, and we try to restore it to disk from backup. We first restore the full backup; "d" contains "a" and "b" again. We next restore the incremental backup. On the incremental backup, "d" contains "a" but not "b". So "b" is deleted from disk. The restore has worked correctly. With the current cpio, "b" would still exist, incorrectly, after the incremental backup was restored. This is extremely useful for backup purposes. It sounds complicated, but it fits in beautifully. (5) Preservation of printable ASCII + short lines. It is too late for this, since the format is already frozen, but it would have been good. The idea here is that an archive of mailable files should be itself mailable, except perhaps for its size. A file that is mailable has only printable ASCII characters, and has no lines longer than some length, maybe 80 characters (I'm not sure). A cpio -c archive has headers which are about 80 characters plus the length of the pathname; this can get too long. Also, the header includes a NUL character or two. I wish someone had thought about this a little bit more before designing the format. It is so close to preserving mailability! Of course, "shar", and also Martin Minow's (decvax!minow; I think it's his) "arch" programs do preserve mailability in almost all cases. (6) Should be public domain. This would avoid the annoying scenario where people get cpio archives but cannot unpack them. I haven't recommended that checksums be introduced into cpio, because I think this can be handled by some other filter. (There are some tools to package software for transmission, available through the AT&T Toolchest, that probably do what I want here.) One could argue that mailability can also be handled by other filters; but I would rather keep things simple for unpacking mailed archives. Comments? ---- Sam Kendall { ihnp4 | seismo!cmcl2 }!delftcc!sam Delft Consulting Corp. ARPA: delftcc!sam@NYU.ARPA