Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!iuvax!bsu-cs!dhesi From: dhesi@bsu-cs.UUCP (Rahul Dhesi) Newsgroups: comp.sys.ibm.pc Subject: Re: File packaging and compression Message-ID: <4225@bsu-cs.UUCP> Date: 6 Oct 88 18:54:00 GMT References: <259@jato.Jpl.Nasa.Gov> Reply-To: dhesi@bsu-cs.UUCP (Rahul Dhesi) Organization: CS Dept, Ball St U, Muncie, Indiana Lines: 40 In article <259@jato.Jpl.Nasa.Gov> jbrown@jato.UUCP (Jordan Brown) writes: >I'm considering building a PUBLIC DOMAIN (that means *no* restrictions on >anything) file packaging and compression program. Think about the following issue carefully. If the archive is a concatenation of files like the cpio, tar, and arc formats, then updating it requires copying the whole archive. If the archive contains more structure, e.g. a linked list of directory entries like the zoo format, then updates need direct access writes but allow you to avoid copying the whole archive. Also, if the compressed file is preceded by length information, as in cpio, tar, and arc, then you can't easily add a compressed file to the archive without knowing the compressed size *first*, which means compressing to a temporary file, which I don't like. Take a look at the way zmodem protocol works: it does not precede file data with length information. Instead, it uses an escape sequence of bytes to denote the end of a file. This may need some tricky programming, and will slow down the speed with which archive contents are listed, but it will let you add a compressed file directly to an archive without creating a temporary file first. The first has the advantage that archives can be read from and written to standard input/output, allowing easy use of pipes in UNIX. The second has the advantage that users with limited disk space can still create and update large archives, and updating a large archive by adding a tiny file does not need much overhead in CPU or I/O time. (The tar format allows appending a file to a tar archive, but then you can get two instances of the same file in the archive, and to extract the file you extract both and let the second one overwrite the first -- not very elegant.) If you can combine the advantages of both in an easy way, you have achieved something very useful. -- Rahul Dhesi UUCP: !{iuvax,pur-ee}!bsu-cs!dhesi