Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!ucbcad!pasteur!ucbvax!USU.BITNET!FATQW
From: FATQW@USU.BITNET
Newsgroups: comp.sys.amiga
Subject: Re: IFF archive proposal
Message-ID: <8801180410.AA09660@jade.berkeley.edu>
Date: 18 Jan 88 03:06:00 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 147

From: fatqw@usu.bitnet (Bryan Ford)
Newsgroups: comp.sys.amiga
Subject: Re: IFF archive proposal
References: <589@gethen.UUCP> <6173@j.cc.purdue.edu>
Organization: Absolutely None, Inc.

(Sorry I may have not explained things well...I'm not the best explainer!)

In article <589@gethen.UUCP> farren@gethen.UUCP (Michael J. Farren) writes:
>In article <6173@j.cc.purdue.edu> ain@j.cc.purdue.edu (Patrick White) writes:
>>
>>FORM ARC - Archive.
>>   The ARC form is a form for collecting more than one file into one file.
>>It can also specify subdirectories to be created before it is unarced, and
>>it can contain nested FORM ARCs as well as FLSTs.
>
>Subdirectories should probably always be handled as nested ARCs, as
>that will allow the de-arcing utility to determine whether or not to
>un-arc them.  There are times when you don't want to unarc all sub-
>directories, but need finer control.

That's exactly what I was saying - sorry if it's unclear.  I'll try and
revise the document.  Do you think it should be a requirement that all
nested FORM ARCs have SBDR chunks?  I think that would be quite reasonable,
since I can think of no instance where a nested ARC would NOT be a sub-
directory.

>>   SBDR - Subdirectory.  This chunk contains a string of characters
>>terminated by a null, specifying a subdirectory for this ARC to be unarced
>>into.  If the specified subdirectory does not already exist, the unarcing
>>program will create it in the current directory (or the directory that a
>>parent ARC was unarced into).
>
>Again, should be OPTIONAL.  You might want to unarc into your current
>subdirectory, regardless of where the file originally came from.

I agree.  However, overriding the SBDR chunk is the unarchiver's job.  If
it has the option, it may be considered a "better" unarchiver.  In other
words, that would just be another feature of "good" unarchivers.

Also, remember that the top ARC chunk does not need to have a SBDR chunk.
If it does, the whole archive will be unarced into this subdirectory and
it will be used as a root for the rest.  (Unless overridden, see above.)
If it doesn't have an SBDR chunk, it will use the specified directory, or
the current directory.

>>   ANAM - Archiver name.  This contains a null-terminated string telling
>>the name of the program that created this archive.  This chunk, if included
>>at all, should only be included in the top-level ARC chunk.
>
>Not necessary if you agree on archival formats ahead of time.  Why
>would you want to (or even need to) know the name of the program that
>created the archive, if the archive were in a standard format?

This would be only for the user's information.  For example, a user gets
an archive somewhere and finds that it doesn't work.  He checks, and finds
out that it didn't get munged in the mail, or on disk, or anything.  Then
he calls up his archiver, with the -Info option or whatever.  It shows the
name of the program that created this junky archive.  He posts it on Usenet,
saying, "Beware of this archiver: it's not compatible with the rest!"  In
other words, probably no archiving program would EVER want to look at this
chunk, except to display it to the user.

Anyway, if an archiver doesn't write one of these, so what?  If an unarchiver
can't handle these, so what?  That's what IFF is all about!

>>   This LEVL - Multilevel marker.  This chunk contains one word of data -
>>the number of compression levels in the main chunk.  For example, an
>>archiver may detect that a certain file would be much better off if it was
>>crunched and then squeezed.
>
>Almost never needed, and the overhead involved in figuring out that
>multiple levels of compression could result in some savings will,
>very likely, overwhelm the advantages of doing the multi-compression.
>I would tend to reject this as an unnecessary complication, with
>little reward.

For now, you're right.  I though of that, and it WOULD be much more than
it's worth...NOW.  That's the thing.  It's there, so even if it doesn't
get used now, it's there if someone wants to use it later.  For example,
if processor speed mega-mupples (a far-off offshoot of doubles, tripples,
etc.), and you want to CRAM a disk with stuff, it might be practical.
Especially if you're about to go somewhere and leave your computer on a
while.  Anyway, I agree - not right now, but it's there, if needed.

>>   PACK - Packing.
>>   CRNC - Crunching.
>>   SQEZ - Squeezing.
>>   SQSH - Squashing.
     STOR - Storing.  Don't forget this one!  The best!
>
>Instead of this how about one: FRMT, which would contain one byte
>indicating the packing algorithm used to compress the file chunk,
>called DATA or BODY or whatever works.  This would allow future
>expansion more easily.

Maybe.  However, I have to disagree with the one-byte thing.  That's why
EA used four-byte chunk names - to avoid collisions.  For example, if we
define 0-5 as the standard formats, and then somebody else comes along,
creates another format, how will he select a number?  Have to send it to
whoever's in charge of this thing and get a number?  That's another goal
of IFF: to keep production schedules on time.  At least we should use four
bytes, just like any ordinary IFF identifier.

Basically, what you're saying, is that for each program section, stick in
a FRMT chunk instead of one of the others, and then right after the size,
stick in a byte, which is what you want, or a longword, which is what I
want, telling the compression format.  Sounds great!  And it will have
much less of a chance of name collisions.  For example, one archiver puts
in a chunk just to accelerate the unarchiving process of its unarchiver,
and another ARC producer creates a separate compression algorithm with the
same name...not so neat.  This way, they would be two separate things,
so if there's a ABCD chunk in the FLST form, and one of the program
segments is encoded in an ABCD encoding algorithm.  As it is, there'd be
lots of problems.  Not this way.  Thanks for the idea!

>Also, it should be the province of the program which creates the ARC
>to choose which schemes it prefers.

YES!  This is all archiver and unarchiver dependent.  For example, to get
things started, I might create a teeny weensy archiver which doesn't handle
subdirectories or anything complex, and just support packing!  Sort of a
dumb program, but it would still create files compatible with more advanced
archivers.

Also, what about a password protection thing?  It could be a PSWD chunk in
either the FLST or ARC forms.  However, it wouldn't contain ANY data.  I've
found that the best password protection mechanisms are the ones that don't
store the password in the file.  If the user enters the wrong password, it
still unarcs the file ok, but the file is totally garbage.  In other words,
the contents of the file would depend on the correct password being entered.
The PSWD chunk would only tell the archiver that the user will need to enter
a password for this file/directory.  Whether the correct password is entered
or not, it would still un-ARC, but as totally garbage.  In this way, the
encoding could be VERY simple.  For example, the first byte of the file is
added, ignoring overflow, with the first byte of the password, etc.  When
the password comes to the end, it wraps around to the front.  Although simple,
it would be IMPOSSIBLE to break without knowing the password!

Thanks for your ideas!  Anybody else?


                                    Bryan

       Bryan Ford                  ///// A computer does what \\\\\
Snail: 1790 East 1400 North       ///// you tell it to do, not \\\\\
       Logan, UT 84321        \\\XX///  what you want it to do. \\\XX///
Email: USU@FATQW.BITNET        \XXXX/ Murphy's Law Calender 1986 \XXXX/