Path: utzoo!mnetor!uunet!husc6!hao!boulder!sunybcs!bingvaxu!leah!rsb584
From: rsb584@leah.Albany.Edu ( Raymond S Brand)
Newsgroups: comp.sys.amiga
Subject: Re: IFF archive proposal
Message-ID: <564@leah.Albany.Edu>
Date: 17 Jan 88 17:14:49 GMT
References: <6173@j.cc.purdue.edu>
Organization: The University at Albany, Computer Services Center
Lines: 105
Summary: Comments

>    ANAM - Archiver name.  This contains a null-terminated string telling
> the name of the program that created this archive.  This chunk, if included
> at all, should only be included in the top-level ARC chunk.

Why is ANAM needed at all?

>    NAME - Filename.  This chunk contains a filename terminated with a null
> byte.  This is a filename which the file will go into.  This is a required
> chunk.  There is still a limit to the length of the filename - 2,147,483,648
> characters to be exact.  You should not need filenames longer than this.
> This was a major issue about the old ARC program which is dealt with here.

31 bits is rather large, don't you think 15 or even 7 is enough? Names should
be limited to containing ASCII characters in the range 20h to 7Eh with the
responsibility of checking that a name is acceptable for the system it is 
being extracted to being in the archive extracter (think about other systems
that could benifit from the new archive format).

>    CRC - CRC check.  This chunk was modified from its original definition
> to accomodate multiple program sections.  The chunk contains as many words
> of data as there are sections in the file - one CRC for each section.  If
> there are too many CRC words, an unarchiver will ignore the rest.  If there
> are too few, the unarchiver will check only the sections with CRCs
> supplied.  If there is no CRC chunks, no checking will be done.


CRCs are easy enough to do that they should be considered mandatory. Too many
or too few should be considered an ERROR.

>    This LEVL - Multilevel marker.  This chunk contains one word of data -
> the number of compression levels in the main chunk.  For example, an
> archiver may detect that a certain file would be much better off if it was
> crunched and then squeezed.  This chunk, if included, indicates the number
> of levels in the main chunk.  If it is a 1, then the main chunk simply
> contains the file.  If it is a two, then the main chunk contains another
> chunk, indicating the same or a different compression method.  Normally
> only one or two levels will be necessary, and usually only one.  However,
> for text files, packing and then crunching may be the best compression
> method.

The need for this escapes me. Almost all compression methods do run length
encoding as a part of higher level methods. 

>    CRNC - Crunching.  As of yet, I don't have the docs for this format, but
> as soon as I get them, I'll include them here.


ARC Crunch is 12 bit Lemple-Zev with run length encoding a a prestep.

>    SQEZ - Squeezing.  Ditto.

ARC Squeeze is Huffman encoding with run length encoding as a prestep.

>    SQSH - Squashing.  This one is controversial.  It is used in PKARC for
> the IBM PC, but hasn't yet made it to the Amiga.  Tell me what you think of
> including it.  It will make archiving programs larger, but whether to
> include it or not depends on whether it will get used very often.  Voice
> your feelings.

I believe that Squash is a 13 bit Lemple-Zev encoding using a different hash
function than the one used in Crush (don't quote me on this one).

Typically the benifit of Squash over Crush is only a few percent, usually on
large files only. 

>    One final note: there is no requirement to sort archived files in any
> way, although archivers may want to sort them for the user's sake.

Sorting makes it easier for the user and the archiving program (the program
doesn't need to search the entire archive looking for a preexisting entry
named GAME to add an entry name GAME, etc.).

>    Although this document is not copyrighted or anything, please don't
> redistribute it very much.  This is because it's only a draft, and it will
> probably get changed, and we want EVERYONE to have the same thing.

When you say everyone, this also means MS-DOG users (sysops) also, who is doing
that version?

>    Please feel free to email suggestions for this file.  Oh, and if someone
> has the docs on crunching and squeezing, please email them to me.  Thanks.

There really are no docs crunching other than the source. The method used is
derived from the unix compress utility. Squeezing is Huffman encoding and is
very straight forward.
> 
>                              History
> 
>   date         author                      changes
> -------- ------------------ ---------------------------------------------
>   ????       Bryan Ford     Created this file
> 01/08/88     Bryan Ford     Added ANAM and SECT, changed CRC chunk
> 
>                               THE END
> 
>         Bryan Ford           +-----------------------------------------+
> Snail:  1790 East 1400 North | A computer does what you tell it to do, |
>         Logan, UT 84321      | not what you want it to do.             |
> Bitnet: FATQW@USU            +------ Murphy's Law Calendar, 1986 ------+

Raymond S. Brand               Fido: 141/255  1-518-489-8968
                               Mail: ihnp4!sun!sunbow!beowulf!rsbx
                                     uunet!steinmetz!beowulf!rsbx
                               Snail: 3A Pinehurst Ave. Albany NY 12203
                               Voice: 1-518-482-8798