Path: utzoo!utgpu!water!watmath!clyde!att!chinet!les
From: les@chinet.UUCP (Leslie Mikesell)
Newsgroups: comp.sys.ibm.pc
Subject: Re: File packaging and compression
Message-ID: <6751@chinet.UUCP>
Date: 7 Oct 88 20:04:33 GMT
References: <259@jato.Jpl.Nasa.Gov> <4225@bsu-cs.UUCP>
Reply-To: les@chinet.chi.il.us (Leslie Mikesell)
Organization: Chinet - Public Access Unix
Lines: 38

In article <4225@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:

>If the archive is a concatenation of files like the cpio, tar, and arc
>formats, then updating it requires copying the whole archive.
>
>If the archive contains more structure, e.g. a linked list of directory
>entries like the zoo format, then updates need direct access writes but
>allow you to avoid copying the whole archive.

>Also, if the compressed file is preceded by length information, as in
>cpio, tar, and arc, then you can't easily add a compressed file to the
>archive without knowing the compressed size *first*, which means
>compressing to a temporary file, which I don't like.

There is also the problem with cpio even without compressing that if the
length of the file changes between writing the cpio header and reading the
end of the file the rest of the archive is corrupted.  I think the
archiver should work in a streaming mode if necessary so that it can
handle tape drives that don't seek, but there should be a length field
that can be filled in if you can seek on the media.  Your idea of a
magic escape sequence to mark the end of an entry solves 2 problems - the
file length where you can't seek on the device, and also the problem of
re-syncing on an archive with a corrupted entry or part of a multi-volume
set.  The program could also keep a separate directory (optional) in
another file or tacked on to the end of the archive.  This could be used
for several purposes with obvious advantages when the archive spans
volumes.  A minor extension would be to allow the directory portion to
contain entries for files that are not contained in the archive which would
allow (a) preserving links that would otherwise not be possible and (b)
restoring a directory tree to exactly the condition that it was in when
the last incremental backup was done (i.e. delete extraneous files that
had been deleted before the incremental but still existed on the last
full backup or intermediate incrementals).  A fairly simple program could
manipulate the information from the directory files to determine where to
find archive copies (disk n of set xxx) and also determine exactly which
files need to be copied in in incremental backup.

Les Mikesell