Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!swrinde!ucsd!ucbvax!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!charon!dik
From: dik@cwi.nl (Dik T. Winter)
Newsgroups: alt.sources.d
Subject: Re: Compressing alt.sex.pictures files
Message-ID: <2033@charon.cwi.nl>
Date: 29 Aug 90 21:40:30 GMT
References: <1990Aug28.192024.22435@zorch.SF-Bay.ORG> <6467@sugar.hackercorp.com> <eillihca.651947189@fizzle.stanford.edu>
Sender: news@cwi.nl
Organization: CWI, Amsterdam
Lines: 44

In article <eillihca.651947189@fizzle.stanford.edu> eillihca@fizzle.stanford.edu (Achille Hui) writes:
 > 1) just compress the raw image extracted from the gif files
 > 2) XOR neighbouring scan lines in raw image and then compress it.
 > 3) subtract neighbouring ......
 > 4) build a color index table for each scan line
 > 
 > and it turns out the size of the resulting files ..
 > 1)	~ 90% of original gif
 > 2,3)	~ 110%
 > 4)	~ 100% if we exclude the color index table for the lines!
 > 
It depends on the kind of compressor you use.  If you used standard compress,
I am not much surprised; compress works very well if there are multi-byte
patterns to be found that will get shorter encodings.  But picture data
(and also audio files) in general do not reveal such patterns.

For audio files a much better compression is obtained when successive samples
are subtracted (or XOR'ed) and the result Huffmann encoded.
The reason this works good on audio files is that the differences (or XOR's,
but for audio differences is better I think) are in general small numbers,
so the largest part of the results will be in a (very) small range of possible
values and that is where Huffmann is good at.  (Note that the lower the
dynamics of the audio the better Huffmann will work.  I have seen Huffmann
compression used this way reduce files to 30-50% where LZW might even go
above 100%.)

Translating this to pictures we find that XOR'ing successive scanlines
will result in only a low number of possible results (again depending on
the colour dynamics of the picture), and so Huffman might be superior.

A disadvantage is that Huffmann is considerably slower than LZW.  Huffmann
has to scan the input twice, once to build the code table and once to
do the coding.  But decompressing is only marginally slower (the reason
that it is slower is that LZW works at the byte level while Huffmann is
at the bit level).

Now if someone is willing to give it a try, somewhere in my sources I have
routines that do Huffmann code table building, compression and decompression.
I know nothing about the GIF format.

(And I sure hope I spelled Huffmann right :-))
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl