Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!hp4nl!cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.compression Subject: Re: WANTED: Mac Compression Formats! Message-ID: <3777@charon.cwi.nl> Date: 28 Jun 91 00:51:29 GMT References: <5429.28697186@nisc.ieee.org> <131516@jake.encore.com> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 59 This is really about why compression techniques used by commonly used programs are not publicly available. In article <131516@jake.encore.com> wcarroll@encore.com (Mr. New Dad) writes: > And it was my impression that Compactor (now known as Compact Pro) uses > a proprietary compression format(algorithm?), I do not know whether it is proprietary. If I receive a program through electronic means (as Compactor did) there is no reason at all to not look at the program and see what it does! Same holds for UnstuffItDeluxe. So I know Compactors and StuffItDeluxes archive formats and unpack them routinely on our Unix machines. I think that (especially in the case of StuffItDeluxe) not publicising (yup, I prefer British spelling) the format of the archives is detrimental to the performance, and also to the acknowledgement to the efforts of the people finding the algorithms. With StuffItDeluxe and Compactor we have just two examples of the cases involved. 1. Compactor. Uses LZ77 followed by Huffman encoding the tokens. Uses a *very* clever scheme to transmit the Huffman table. The people involved in it should be known for what they have done. So why does Bill Goodman not make the format public? 2. StuffItDeluxe. The *BEST* format uses LZ77 followed by adaptive Huffman encoding the tokens. Alas, they use the very first implementation of adaptive Huffman. (I do not have the reference ready, but if you look up adaptive Huffman and trace back to the very first article on the subject, there is your reference!) The problem is that that implementation is not very efficient (rather extremely inefficient) there are later implementations that are much more efficient. So why does Raymond Lau not make the format public? Although Raymond Lay with his initial implementations of StuffIt (with full documentation of the archive format) made a tremendous effort, he did not so very good with the later implementations. I see only one reason to not make public the archive format: locking in your customers. (Where is the description of the archive format of Diamond?, Disk Doubler?) If for some reason you loose access to a Macintosh you will be unable to retrieve your archived data. In my opinion it is always a good thing to document the internal structure of the archives: People will be able to extract the contents, even if they do not have access to a machine where an official extractor exists. They can write programs to do it. So, if Bill Goodman or Raymond Lau are reading this (or one of their representatives), what is the reason for the non-disclosure? As far as I know all PC archivers have publicly available (in source) de-archivers (I may be wrong here of course). Furthermore, even if you make the format public this does not mean that you make public the particular algorithm used! -- For if I might need a disclaimer here: this is my opinion. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl