Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!purdue!i.cc.purdue.edu!j.cc.purdue.edu!ain From: ain@j.cc.purdue.edu (Patrick White) Newsgroups: comp.sys.amiga Subject: IFF archive proposal Message-ID: <6173@j.cc.purdue.edu> Date: 14 Jan 88 17:19:31 GMT Reply-To: ain@j.cc.purdue.edu (Patrick White) Organization: PUCC Land, USA Lines: 152 Bryan Ford has been bouncing an IFF archive idea he had off of me for the last few weeks. We finally decided to post it to get a wider base of opinions (and also because I don't know much about IFF formats). So, following this blurb is the proposal he sent me. Please send your comments to Bryan (FATQW@USU.bitnet) as I'm just commenting on it like everybody else... if you can't get mail through to him, you can send it to me and I'll forward it. I'm posting this because we have not figured out how Bryan can post from bitnet yet -- any pointers would be greatly appreciated. Thanks. Pat White (ain@j.cc.purdue.edu | patwhite@purccvm) ======================================== There are two sections in this document. The first one describes the ARC form, and the second describes the FLST form. Note that a FLST form does not necessarily need to be included in an ARC form, but the multifile and subdirectory capabilities are lost. It is the unlimited nesting capability of the IFF format that makes this file format possible. Thanks EA! An archive file is made up of zero or more ARC chunks, with FLST chunks as their "children." In other words, the ARC chunks are the tree and branches, while the FLST chunks are the "leaves". FORM ARC - Archive. The ARC form is a form for collecting more than one file into one file. It can also specify subdirectories to be created before it is unarced, and it can contain nested FORM ARCs as well as FLSTs. Following is a description of the various chunks that may appear withing an ARC. SBDR - Subdirectory. This chunk contains a string of characters terminated by a null, specifying a subdirectory for this ARC to be unarced into. If the specified subdirectory does not already exist, the unarcing program will create it in the current directory (or the directory that a parent ARC was unarced into). Note that for languages that require a length byte followed by the string, the chunk length minus one may be used as the length. This is because of the extra zero required at the end. ANAM - Archiver name. This contains a null-terminated string telling the name of the program that created this archive. This chunk, if included at all, should only be included in the top-level ARC chunk. FORM ARC - A child ARC. This is useful for "sub-archives" which, when unarced, go into various directories automatically. A child ARC doesn't necessarily need a SBDR chunk, but it makes little sense otherwise. FORM FLST - File chunks. These chunks contain the actual files which make up the archive. They are not necessary for an archive, and in some cases this may be useful. For one example, maybe an archive wants to create several subdirectories which contain files, but no files in the root. As another example, you may want a child ARC to be completely empty except for a SBDR chunk - for example a "saved games" directory without any saved games. In other words, this may be useful for just creating directories to be used later, but not put any files in it. FORM FLST - File These forms contain files. Each may contain several chunks described below. NAME - Filename. This chunk contains a filename terminated with a null byte. This is a filename which the file will go into. This is a required chunk. There is still a limit to the length of the filename - 2,147,483,648 characters to be exact. You should not need filenames longer than this. This was a major issue about the old ARC program which is dealt with here. LEN - Length. This is a required chunk which contains one longword of data - the actual (uncompressed) size of the file. This may be useful for unarcing programs to check the free space on the disk before they start to write a long file. SECT - Sections. This chunk is not required, and if omitted, its default value is 1. It contains one word of data: the number of sections, and thus number of compression chunks, in the file. This came out of an idea from Pat White (ain@j.cc.purdue.edu). It allows files to be split up, so if part of a file is munged, the rest of the file may be salvaged. For example, if an archiving program detects that it is archiving an IFF file of some kind, it checks to see what the top chunk is. If it's a FORM or a LIST, it gets compressed with simply one section, and no SECT chunk is written. However, if the top chunk is a CAT, then the archiving program break up the file into multiple compression chunks, and include a SECT chunk with the number of compression chunks. Each compression chunk will contain one of the chunks in the CAT, and an unarchiver can rebuild the structure by CATing all the compression chunks together. If one is bad, the rest will get CATted, so the user can still get part of the file back. This would be good for multipage documents. For example, if one page gets munged, the unarchiving program would restore the other pages. CRC - CRC check. This chunk was modified from its original definition to accomodate multiple program sections. The chunk contains as many words of data as there are sections in the file - one CRC for each section. If there are too many CRC words, an unarchiver will ignore the rest. If there are too few, the unarchiver will check only the sections with CRCs supplied. If there is no CRC chunks, no checking will be done. This LEVL - Multilevel marker. This chunk contains one word of data - the number of compression levels in the main chunk. For example, an archiver may detect that a certain file would be much better off if it was crunched and then squeezed. This chunk, if included, indicates the number of levels in the main chunk. If it is a 1, then the main chunk simply contains the file. If it is a two, then the main chunk contains another chunk, indicating the same or a different compression method. Normally only one or two levels will be necessary, and usually only one. However, for text files, packing and then crunching may be the best compression method. The following chunks are the main data chunks - they must appear below the chunks listed above, and there must be one and only one in the file (although a main chunk may contain another one). STOR - Storage without compression. This is usually used for very small files which would not gain anything in compression. The chunk's data is an exact duplicate of what will go into the file. Although other main chunks within STOR's are allowed, there is no reason - files should normally have one or more nested compression chunks, or a STOR if the file can't be compressed. PACK - Packing. This format consists of a series of bytes with replications packed down to one. The format is simple: the bytes in the PACK chunk exactly duplicate those in uncompressed file, except where there are three or more of the same bytes in a row. In this case, the format is the byte which repeats, then a hex 90, then the number of extra bytes. A hex 90 followed by a zero denotes the value of hex 90 in the stream. For example, if the text is "ABCCCCDEF", then the packed format would be "ABC<90H><3>DEF".Notice that the value after the 90H is one less than the number of duplicate bytes actually in the file. In other words, this means "put 3 MORE C's here". Note that this is not the same as the packing algorithm used in the older ARC file. CRNC - Crunching. As of yet, I don't have the docs for this format, but as soon as I get them, I'll include them here. SQEZ - Squeezing. Ditto. SQSH - Squashing. This one is controversial. It is used in PKARC for the IBM PC, but hasn't yet made it to the Amiga. Tell me what you think of including it. It will make archiving programs larger, but whether to include it or not depends on whether it will get used very often. Voice your feelings. When better compression algorithms come out, they may be added to the FORM ARC. However, these will NOT be upward compatible - programs which use them will not be compatible with programs which don't. One final note: there is no requirement to sort archived files in any way, although archivers may want to sort them for the user's sake. Although this document is not copyrighted or anything, please don't redistribute it very much. This is because it's only a draft, and it will probably get changed, and we want EVERYONE to have the same thing. Please feel free to email suggestions for this file. Oh, and if someone has the docs on crunching and squeezing, please email them to me. Thanks. History date author changes -------- ------------------ --------------------------------------------- ???? Bryan Ford Created this file 01/08/88 Bryan Ford Added ANAM and SECT, changed CRC chunk THE END Bryan Ford +-----------------------------------------+ Snail: 1790 East 1400 North | A computer does what you tell it to do, | Logan, UT 84321 | not what you want it to do. | Bitnet: FATQW@USU +------ Murphy's Law Calendar, 1986 ------+