Path: utzoo!mnetor!uunet!husc6!hao!boulder!sunybcs!bingvaxu!leah!rsb584 From: rsb584@leah.Albany.Edu ( Raymond S Brand) Newsgroups: comp.sys.amiga Subject: Re: IFF archive proposal Message-ID: <564@leah.Albany.Edu> Date: 17 Jan 88 17:14:49 GMT References: <6173@j.cc.purdue.edu> Organization: The University at Albany, Computer Services Center Lines: 105 Summary: Comments > ANAM - Archiver name. This contains a null-terminated string telling > the name of the program that created this archive. This chunk, if included > at all, should only be included in the top-level ARC chunk. Why is ANAM needed at all? > NAME - Filename. This chunk contains a filename terminated with a null > byte. This is a filename which the file will go into. This is a required > chunk. There is still a limit to the length of the filename - 2,147,483,648 > characters to be exact. You should not need filenames longer than this. > This was a major issue about the old ARC program which is dealt with here. 31 bits is rather large, don't you think 15 or even 7 is enough? Names should be limited to containing ASCII characters in the range 20h to 7Eh with the responsibility of checking that a name is acceptable for the system it is being extracted to being in the archive extracter (think about other systems that could benifit from the new archive format). > CRC - CRC check. This chunk was modified from its original definition > to accomodate multiple program sections. The chunk contains as many words > of data as there are sections in the file - one CRC for each section. If > there are too many CRC words, an unarchiver will ignore the rest. If there > are too few, the unarchiver will check only the sections with CRCs > supplied. If there is no CRC chunks, no checking will be done. CRCs are easy enough to do that they should be considered mandatory. Too many or too few should be considered an ERROR. > This LEVL - Multilevel marker. This chunk contains one word of data - > the number of compression levels in the main chunk. For example, an > archiver may detect that a certain file would be much better off if it was > crunched and then squeezed. This chunk, if included, indicates the number > of levels in the main chunk. If it is a 1, then the main chunk simply > contains the file. If it is a two, then the main chunk contains another > chunk, indicating the same or a different compression method. Normally > only one or two levels will be necessary, and usually only one. However, > for text files, packing and then crunching may be the best compression > method. The need for this escapes me. Almost all compression methods do run length encoding as a part of higher level methods. > CRNC - Crunching. As of yet, I don't have the docs for this format, but > as soon as I get them, I'll include them here. ARC Crunch is 12 bit Lemple-Zev with run length encoding a a prestep. > SQEZ - Squeezing. Ditto. ARC Squeeze is Huffman encoding with run length encoding as a prestep. > SQSH - Squashing. This one is controversial. It is used in PKARC for > the IBM PC, but hasn't yet made it to the Amiga. Tell me what you think of > including it. It will make archiving programs larger, but whether to > include it or not depends on whether it will get used very often. Voice > your feelings. I believe that Squash is a 13 bit Lemple-Zev encoding using a different hash function than the one used in Crush (don't quote me on this one). Typically the benifit of Squash over Crush is only a few percent, usually on large files only. > One final note: there is no requirement to sort archived files in any > way, although archivers may want to sort them for the user's sake. Sorting makes it easier for the user and the archiving program (the program doesn't need to search the entire archive looking for a preexisting entry named GAME to add an entry name GAME, etc.). > Although this document is not copyrighted or anything, please don't > redistribute it very much. This is because it's only a draft, and it will > probably get changed, and we want EVERYONE to have the same thing. When you say everyone, this also means MS-DOG users (sysops) also, who is doing that version? > Please feel free to email suggestions for this file. Oh, and if someone > has the docs on crunching and squeezing, please email them to me. Thanks. There really are no docs crunching other than the source. The method used is derived from the unix compress utility. Squeezing is Huffman encoding and is very straight forward. > > History > > date author changes > -------- ------------------ --------------------------------------------- > ???? Bryan Ford Created this file > 01/08/88 Bryan Ford Added ANAM and SECT, changed CRC chunk > > THE END > > Bryan Ford +-----------------------------------------+ > Snail: 1790 East 1400 North | A computer does what you tell it to do, | > Logan, UT 84321 | not what you want it to do. | > Bitnet: FATQW@USU +------ Murphy's Law Calendar, 1986 ------+ Raymond S. Brand Fido: 141/255 1-518-489-8968 Mail: ihnp4!sun!sunbow!beowulf!rsbx uunet!steinmetz!beowulf!rsbx Snail: 3A Pinehurst Ave. Albany NY 12203 Voice: 1-518-482-8798