Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: ka@hropus.UUCP (Kenneth Almquist) Newsgroups: comp.std.unix Subject: Re: tar vs. cpio Message-ID: <8276@ut-sally.UUCP> Date: Mon, 15-Jun-87 15:52:11 EDT Article-I.D.: ut-sally.8276 Posted: Mon Jun 15 15:52:11 1987 Date-Received: Sun, 21-Jun-87 06:44:21 EDT References: <8188@ut-sally.UUCP> <8208@ut-sally.UUCP> <8249@ut-sally.UUCP> Sender: std-unix@ut-sally.UUCP Reply-To: ka@hropus.UUCP (Kenneth Almquist) Organization: Bell Labs, Holmdel, NJ Lines: 60 Approved: jsq@sally.utexas.edu (Moderator, John Quarterman) Summary: Tar format makes correct handling of links impossible. From: ka@hropus.UUCP (Kenneth Almquist) > [ We are discussing standardizing a data interchange/archive format > in a standard that its authors explicitly wanted to be implementable > on hosted, i.e., non-UNIX-based, systems. The inclusion of inode > numbers is a problem for such implementations, especially when it > is not necessary, as demonstrated by the tar format. -mod ] Several people have suggested that tar's method of handling links is better than cpio's. After looking at the tar format, I wondered how tar could possibly handle links correctly. A quick experiment showed that it doesn't. Try the following: > file1 ln file1 file2 tar -cf archive file1 file2 rm file1 file2 tar -xf archive file2 The second tar command will fail because tar will simply try to create a link from file1 to file2, but since I only requested that file2 be extracted file1 does not exist. I claim that this is a bug in the tar archive format rather than just the tar program. Consider what tar must do to function correctly. Tar could remember the location of file1 and lseek to it in this particular example, but in general the input to tar is not a regular file and thus may not be seekable. The best bug fix that I could come up with is to to make tar write the contents of all files that it does not extract to a temporary file. This is unsatisfactory because a user who tried to extract a single file from a 32 megabyte tape would almost certainly run out of disk space. So it seems to me that tar cannot be made to handle links correctly unless the tar archive format is changed. The cpio format, on the other hand, allows links to be handled correctly. The fact that cpio includes inode numbers is not all that major a problem for non-UNIX based systems. Since the only thing the inode numbers are used for is resolving links, a system which does not support (non-symbolic) links can leave garbage in the inode field when writing tapes. A system which does have links but does not have inode numbers can use a sequence number in place of the inode number. I recognize that users will very rarely encounter this bug in tar, but I still view it as a serious problem in a *standard*. The question is not whether this bug in tar desperately needs to be fixed (which is doesn't), but whether it is reasonable to expect vendors selling cpio to deliberately introduce a bug into cpio. Unless someone can suggest a good way to make cpio use the tar format and still work correctly, vendors will have to do just that to be compatible with the new standard. I wonder if there is still any chance of a new interchange format that corrected the deficiencies of both cpio and tar being accepted as the standard. Assuming someone could be found to write a public domain implementation of the new format, would that be sufficient to make it a reasonable alternative to the existing implementations? Kenneth Almquist Volume-Number: Volume 11, Number 66