Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: trb@ima.ISC.COM (Andrew Tannenbaum) Newsgroups: comp.std.unix Subject: Re: tar or cpio? Message-ID: <8126@ut-sally.UUCP> Date: Wed, 20-May-87 16:37:12 EDT Article-I.D.: ut-sally.8126 Posted: Wed May 20 16:37:12 1987 Date-Received: Sun, 24-May-87 21:38:51 EDT References: <8001@ut-sally.UUCP> Sender: std-unix@ut-sally.UUCP Reply-To: trb@ima.ISC.COM (Andrew Tannenbaum) Organization: Interactive Systems, Boston, MA Lines: 98 Approved: jsq@sally.utexas.edu (Moderator, John Quarterman) From: trb@ima.ISC.COM (Andrew Tannenbaum) I don't have Section 10 of the POSIX Trial Use Standard, but I am interested in what happens to tar and cpio in POSIX. I see that the netnews discussion of this has been partly a popularity contest between tar and cpio. There are more important issues to discuss than people's provincial biases. If you come from BSD land, you probably like tar. If you come from AT&T land, you probably like cpio. I have some comments about cpio, since it is my personal favorite. They apply to both the file format and to the program function. Some comments apply to tar as well. I like the idea of cpio taking a list of files on stdin. I wish tar had this option. tar cv `find / -print | fgrep -v -f except.file` doesn't cut it. [ Evidently John Gilmore's public domain implementation that he posted to comp.sources has this. I know of no proprietary version that does. -mod ] cpio's binary format should have been killed off long ago. cpio has a 'portable' format, which still has several problems: - Byte swapping and its friends. There are systems which swap bytes and/or halfwords. There are even systems which xor 0 and 1 bits on tapes. If CPIO wrote a magic number 0x12345678 in the header, it could resolve these problems painlessly. - I agree that the binary cpio header is silly. The portable header is all printable ASCII data, but the filename is terminated by a null, which makes it harder to play with the archive. Here is a shar-like program which makes a cpio archive which can unpack itself. <<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>> #! /bin/sh # take a list of files on stdin and make them into a bundle which # can be passed through sh to extract them. cat << \! #!/bin/sh # cpio archive (read a; read a; read a; read a; cat) < $0 | cpio -icdm exit ! cpio -oac <<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>> As I recall, it can have problems because of the fact that the filename is null-terminated, like if you try to read its output into a mail message with an editor. It would also be neighborly if the ASCII header was more human readable, a space or carriage return here or there wouldn't hurt at all. I realize that this is a standardization effort, but if you are going to enhance the format for some portability reason, you might want to consider my enhancement suggestions. - The familiar problems with damaged archives should be fixed (Out of phase--get help). - There are systems which need to extend the archive formats in local ways, for instance, to add extra mode information for a secure UNIX implementation, or file type information when the UNIX system deals with other types of file systems. It would be very useful to have a compatible way to extend the header such that any system could check the local field and either use or ignore the information. Right now, there is little hope for compatibility, the only solutions I can think of (various kinds of shadow files which contain mode info) are quite kludgy. - These programs should deal with multiple tape archives in a standard way. I have seen many local hacks to do this. - Blocksizes for speed, space, and streaming efficiency are best handled by blocking filters rather than by hacks like -B. I have heard of ctccpio, but can only worry about what it actually is. How many programs are going to have to have knowledge about how many goofy devices? I don't understand why cpio has to know anything about a device. This is UNIX, isn't it? Which brings us back to the question of multi-tape archives, maybe the blocking filter should also handle the multi-tape problem? This means lots of data travels over a pipe. Modern OS's should be able to do something smart here. (The multi-tape question leaves me with a queasy feeling.) I would like to see a discussion about tar and cpio rather than opinions about which is better. I am particularly concerned about extending the header format to deal with atypical file types. Andrew Tannenbaum Interactive Boston, MA +1 617 247 1155 Volume-Number: Volume 11, Number 34