Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: guy@sun.com (Guy Harris) Newsgroups: comp.std.unix Subject: Re: tar or cpio? Message-ID: <8006@ut-sally.UUCP> Date: Sun, 10-May-87 03:27:25 EDT Article-I.D.: ut-sally.8006 Posted: Sun May 10 03:27:25 1987 Date-Received: Mon, 11-May-87 04:49:10 EDT References: <8001@ut-sally.UUCP> Sender: std-unix@ut-sally.UUCP Reply-To: guy@sun.com (Guy Harris) Lines: 120 Approved: jsq@sally.utexas.edu (Moderator, John Quarterman) From guy@sun.com Sun May 10 02:30:14 1987 From: guy@sun.com (Guy Harris) > As the moderator of this newsgroup, I solicit comments about what should > be done with section 10. One thing that should not be done, under any circumstances, is to replace "tar" with "cpio" - *especially* if it includes the old non-"-c" form. The non-portable form is completely useless for moving data between systems with different byte orders unless you have a clever "cpio" that figures out that the byte order is backwards and undoes the damage. I discovered this when trying to read a "cpio" tape made on a VAX in the old format; no combination of "cpio" byte-swapping options and "dd conv=swab" would help. I finally ended up fixing our "cpio" to do the aforementioned look-at-the-header-and-undo-the-damage stuff. The X/OPEN standard uses "cpio". The rationale given exhibits a distressing degree of incompetence: If an exchange mdeium is to be read on a target machine that is architecturally different from the source machine, problems may arise concerning the ordering of bytes within a word and words within a long word (see the portability guides in Part III). These can easily be handled when using "cpio" as an exchange utility, while with "tar" it may be a little more difficult. Now, I will first note here that the *only* time I had a problem moving "tar" tapes between machines was when I had to move things to a Plexus. The problem was *not* that the machines had different byte orders; the problem was that the Plexus had a typical brain-damaged Multibus tape controller that swapped bytes when it transferred data to and from memory. "cpio" would not have made this any easier; the System III byte-swapping option did not swap the bytes on *all* blocks read, but just swapped the bytes on data blocks and in file names. The intent here was clearly that you would read a tape written on a machine with a different byte order by doing something like dd if=/dev/rmt0 conv=swab | cpio -ids "dd" would swap everything; "cpio -s" would un-swap everything but the binary data in the header. (We pause to note that merely swapping the binary data in the header would be much more efficient, especially given that "dd" is somewhat of a pig.) This works, but is less than wonderful. (And it doesn't solve the problem with the Plexus; to solve that you just stick the "dd" in front of "cpio" and don't bother with "-s" at all.) The System V "cpio" byte-swapping and word-swapping options work *only* on data blocks; they have no effect whatsoever on binary data in the header or on file names. This means that the trick that worked with the System III "cpio" wouldn't work at all - and the problem with the Plexus still isn't fixed, if that was the intent. The S5 options are useless for old-style non-"-c" tapes. They are of some use with "-c" tapes - but only if all the files on the tape consist solely of "short"s or "long"s, since the data in the data blocks are all byte-swapped or word-swapped in the same fashion. Most files I tend to put on or extract from "cpio" tapes are text files, which obviously need no swapping. In short, the arguments offered by X/OPEN in favor of "cpio" are completely bogus. Now for the arguments against "cpio" format: 1) It is somewhat more UNIX-specific, in that the "mode" field of the "stat" structure is written out numerically. POSIX does not specify required numeric values for this field. "tar" indicates the file type with a standard symbolic code, so you can read "tar" tapes even if the machine on which the tape was written and the machine on which it is being read do not have the same values for this field. 2) It does not handle hard links particularly elegantly. "cpio" knows nothing of files with multiple hard links when it writes a tape; if it is told to write "foo" and "bar" to the tape, and they are both hard links to the same file, it writes two copies of this file to the tape. The hard links are established when the tape is read. If the files appear on the tape in the order "foo" and then "bar", "foo" will be read in first. Once "bar" has been read in, "cpio" will check to see if it has already read in a file with the same dev/inumber value. If so, it will delete "bar" and make a hard link to "foo" called "bar". 3) It is less common. Almost all UNIX systems that support "cpio" also support "tar"; many UNIX systems that support "tar" do not support "cpio". 4) POSIX has already chosen "tar" format; why should it change horses in midstream, especially given that the new horse is lame and, despite the claims made by the person selling the horse, is not capable of pulling any heavier loads than the existing one? Anyway, I'll have to dig up the proposal made to POSIX that "cpio" supplement or replace "tar" and cast a very strong "no" vote citing the above. Now, as for the proposal for handing the whole thing off to P1003.2 - I have some inclination to support this. It could, in some ways, be considered neither part of the scope of P1003.1 nor of P1003.2, but to be a separate standardization topic entirely. However, if I had to choose which of the two items - C-language binding to OS system call and library functions, or command-language functions - the data interchange standard belonged to, I'd vote in favor of the latter. There is no library of functions for reading or writing "tar" tapes, but there is a command (namely, "tar") for reading and writing them, so I think it belongs in that category - especially given that Section 10 currently says "A conforming system shall implement a user utility..." which really sounds a lot more like a P1003.2 requirement than a P1003.1 requirement. Volume-Number: Volume 11, Number 9