Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!purdue!i.cc.purdue.edu!j.cc.purdue.edu!ain
From: ain@j.cc.purdue.edu (Patrick White)
Newsgroups: comp.sys.amiga
Subject: IFF archive proposal
Message-ID: <6173@j.cc.purdue.edu>
Date: 14 Jan 88 17:19:31 GMT
Reply-To: ain@j.cc.purdue.edu (Patrick White)
Organization: PUCC Land, USA
Lines: 152


   Bryan Ford has been bouncing an IFF archive idea he had off of me for the
last few weeks.  We finally decided to post it to get a wider base of opinions
(and also because I don't know much about IFF formats).
   So, following this blurb is the proposal he sent me.  Please send your
comments to Bryan (FATQW@USU.bitnet) as I'm just commenting on it like
everybody else... if you can't get mail through to him, you can send it to
me and I'll forward it.

   I'm posting this because we have not figured out how Bryan can post from
bitnet yet -- any pointers would be greatly appreciated.

   Thanks.

Pat White (ain@j.cc.purdue.edu | patwhite@purccvm)

========================================

   There are two sections in this document.  The first one describes the
ARC form, and the second describes the FLST form.  Note that a FLST form
does not necessarily need to be included in an ARC form, but the multifile
and subdirectory capabilities are lost.
    It is the unlimited nesting capability of the IFF format that makes
this file format possible.  Thanks EA!
    An archive file is made up of zero or more ARC chunks, with FLST chunks
as their "children."  In other words, the ARC chunks are the tree and
branches, while the FLST chunks are the "leaves".

FORM ARC - Archive.
   The ARC form is a form for collecting more than one file into one file.
It can also specify subdirectories to be created before it is unarced, and
it can contain nested FORM ARCs as well as FLSTs.
   Following is a description of the various chunks that may appear
withing an ARC.
   SBDR - Subdirectory.  This chunk contains a string of characters
terminated by a null, specifying a subdirectory for this ARC to be unarced
into.  If the specified subdirectory does not already exist, the unarcing
program will create it in the current directory (or the directory that a
parent ARC was unarced into).  Note that for languages that require a
length byte followed by the string, the chunk length minus one may be used
as the length.  This is because of the extra zero required at the end.
   ANAM - Archiver name.  This contains a null-terminated string telling
the name of the program that created this archive.  This chunk, if included
at all, should only be included in the top-level ARC chunk.
   FORM ARC - A child ARC.  This is useful for "sub-archives" which, when
unarced, go into various directories automatically.  A child ARC doesn't
necessarily need a SBDR chunk, but it makes little sense otherwise.
   FORM FLST - File chunks.  These chunks contain the actual files which
make up the archive.  They are not necessary for an archive, and in some
cases this may be useful.  For one example, maybe an archive wants to create
several subdirectories which contain files, but no files in the root.  As
another example, you may want a child ARC to be completely empty except for
a SBDR chunk - for example a "saved games" directory without any saved
games.  In other words, this may be useful for just creating directories to
be used later, but not put any files in it.

FORM FLST - File
   These forms contain files.  Each may contain several chunks described
below.
   NAME - Filename.  This chunk contains a filename terminated with a null
byte.  This is a filename which the file will go into.  This is a required
chunk.  There is still a limit to the length of the filename - 2,147,483,648
characters to be exact.  You should not need filenames longer than this.
This was a major issue about the old ARC program which is dealt with here.
   LEN - Length.  This is a required chunk which contains one longword of
data - the actual (uncompressed) size of the file.  This may be useful for
unarcing programs to check the free space on the disk before they start to
write a long file.
   SECT - Sections.  This chunk is not required, and if omitted, its
default value is 1.  It contains one word of data: the number of sections,
and thus number of compression chunks, in the file.  This came out of an
idea from Pat White (ain@j.cc.purdue.edu).  It allows files to be split up,
so if part of a file is munged, the rest of the file may be salvaged.  For
example, if an archiving program detects that it is archiving an IFF file
of some kind,  it checks to see what the top chunk is.  If it's a FORM or a
LIST, it gets compressed with simply one section, and no SECT chunk is
written.  However, if the top chunk is a CAT, then the archiving program
break up the file into multiple compression chunks, and include a SECT
chunk with the number of compression chunks.  Each compression chunk will
contain one of the chunks in the CAT, and an unarchiver can rebuild the
structure by CATing all the compression chunks together.  If one is bad,
the rest will get CATted, so the user can still get part of the file back.
This would be good for multipage documents.  For example, if one page gets
munged, the unarchiving program would restore the other pages.
   CRC - CRC check.  This chunk was modified from its original definition
to accomodate multiple program sections.  The chunk contains as many words
of data as there are sections in the file - one CRC for each section.  If
there are too many CRC words, an unarchiver will ignore the rest.  If there
are too few, the unarchiver will check only the sections with CRCs
supplied.  If there is no CRC chunks, no checking will be done.
   This LEVL - Multilevel marker.  This chunk contains one word of data -
the number of compression levels in the main chunk.  For example, an
archiver may detect that a certain file would be much better off if it was
crunched and then squeezed.  This chunk, if included, indicates the number
of levels in the main chunk.  If it is a 1, then the main chunk simply
contains the file.  If it is a two, then the main chunk contains another
chunk, indicating the same or a different compression method.  Normally
only one or two levels will be necessary, and usually only one.  However,
for text files, packing and then crunching may be the best compression
method.
   The following chunks are the main data chunks - they must appear below
the chunks listed above, and there must be one and only one in the file
(although a main chunk may contain another one).
   STOR - Storage without compression.  This is usually used for very small
files which would not gain anything in compression.  The chunk's data is an
exact duplicate of what will go into the file.  Although other main chunks
within STOR's are allowed, there is no reason - files should normally have
one or more nested compression chunks, or a STOR if the file can't be
compressed.
   PACK - Packing.  This format consists of a series of bytes with
replications packed down to one.  The format is simple: the bytes in the
PACK chunk exactly duplicate those in uncompressed file, except where there
are three or more of the same bytes in a row.  In this case, the format is
the byte which repeats, then a hex 90, then the number of extra bytes.  A
hex 90 followed by a zero denotes the value of hex 90 in the stream.  For
example, if the text is "ABCCCCDEF", then the packed format would be
"ABC<90H><3>DEF".Notice that the value after the 90H is one less than the
number of duplicate bytes actually in the file.  In other words, this means
"put 3 MORE C's here".  Note that this is not the same as the packing
algorithm used in the older ARC file.
   CRNC - Crunching.  As of yet, I don't have the docs for this format, but
as soon as I get them, I'll include them here.
   SQEZ - Squeezing.  Ditto.
   SQSH - Squashing.  This one is controversial.  It is used in PKARC for
the IBM PC, but hasn't yet made it to the Amiga.  Tell me what you think of
including it.  It will make archiving programs larger, but whether to
include it or not depends on whether it will get used very often.  Voice
your feelings.
   When better compression algorithms come out, they may be added to the
FORM ARC.  However, these will NOT be upward compatible - programs which use
them will not be compatible with programs which don't.
   One final note: there is no requirement to sort archived files in any
way, although archivers may want to sort them for the user's sake.
   Although this document is not copyrighted or anything, please don't
redistribute it very much.  This is because it's only a draft, and it will
probably get changed, and we want EVERYONE to have the same thing.
   Please feel free to email suggestions for this file.  Oh, and if someone
has the docs on crunching and squeezing, please email them to me.  Thanks.

                             History

  date         author                      changes
-------- ------------------ ---------------------------------------------
  ????       Bryan Ford     Created this file
01/08/88     Bryan Ford     Added ANAM and SECT, changed CRC chunk

                              THE END

        Bryan Ford           +-----------------------------------------+
Snail:  1790 East 1400 North | A computer does what you tell it to do, |
        Logan, UT 84321      | not what you want it to do.             |
Bitnet: FATQW@USU            +------ Murphy's Law Calendar, 1986 ------+