Path: utzoo!utgpu!water!watmath!clyde!att!chinet!les From: les@chinet.UUCP (Leslie Mikesell) Newsgroups: comp.sys.ibm.pc Subject: Re: File packaging and compression Message-ID: <6751@chinet.UUCP> Date: 7 Oct 88 20:04:33 GMT References: <259@jato.Jpl.Nasa.Gov> <4225@bsu-cs.UUCP> Reply-To: les@chinet.chi.il.us (Leslie Mikesell) Organization: Chinet - Public Access Unix Lines: 38 In article <4225@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >If the archive is a concatenation of files like the cpio, tar, and arc >formats, then updating it requires copying the whole archive. > >If the archive contains more structure, e.g. a linked list of directory >entries like the zoo format, then updates need direct access writes but >allow you to avoid copying the whole archive. >Also, if the compressed file is preceded by length information, as in >cpio, tar, and arc, then you can't easily add a compressed file to the >archive without knowing the compressed size *first*, which means >compressing to a temporary file, which I don't like. There is also the problem with cpio even without compressing that if the length of the file changes between writing the cpio header and reading the end of the file the rest of the archive is corrupted. I think the archiver should work in a streaming mode if necessary so that it can handle tape drives that don't seek, but there should be a length field that can be filled in if you can seek on the media. Your idea of a magic escape sequence to mark the end of an entry solves 2 problems - the file length where you can't seek on the device, and also the problem of re-syncing on an archive with a corrupted entry or part of a multi-volume set. The program could also keep a separate directory (optional) in another file or tacked on to the end of the archive. This could be used for several purposes with obvious advantages when the archive spans volumes. A minor extension would be to allow the directory portion to contain entries for files that are not contained in the archive which would allow (a) preserving links that would otherwise not be possible and (b) restoring a directory tree to exactly the condition that it was in when the last incremental backup was done (i.e. delete extraneous files that had been deleted before the incremental but still existed on the last full backup or intermediate incrementals). A fairly simple program could manipulate the information from the directory files to determine where to find archive copies (disk n of set xxx) and also determine exactly which files need to be copied in in incremental backup. Les Mikesell