Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!swrinde!ucsd!ucbvax!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!charon!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: alt.sources.d Subject: Re: Compressing alt.sex.pictures files Message-ID: <2033@charon.cwi.nl> Date: 29 Aug 90 21:40:30 GMT References: <1990Aug28.192024.22435@zorch.SF-Bay.ORG> <6467@sugar.hackercorp.com> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 44 In article eillihca@fizzle.stanford.edu (Achille Hui) writes: > 1) just compress the raw image extracted from the gif files > 2) XOR neighbouring scan lines in raw image and then compress it. > 3) subtract neighbouring ...... > 4) build a color index table for each scan line > > and it turns out the size of the resulting files .. > 1) ~ 90% of original gif > 2,3) ~ 110% > 4) ~ 100% if we exclude the color index table for the lines! > It depends on the kind of compressor you use. If you used standard compress, I am not much surprised; compress works very well if there are multi-byte patterns to be found that will get shorter encodings. But picture data (and also audio files) in general do not reveal such patterns. For audio files a much better compression is obtained when successive samples are subtracted (or XOR'ed) and the result Huffmann encoded. The reason this works good on audio files is that the differences (or XOR's, but for audio differences is better I think) are in general small numbers, so the largest part of the results will be in a (very) small range of possible values and that is where Huffmann is good at. (Note that the lower the dynamics of the audio the better Huffmann will work. I have seen Huffmann compression used this way reduce files to 30-50% where LZW might even go above 100%.) Translating this to pictures we find that XOR'ing successive scanlines will result in only a low number of possible results (again depending on the colour dynamics of the picture), and so Huffman might be superior. A disadvantage is that Huffmann is considerably slower than LZW. Huffmann has to scan the input twice, once to build the code table and once to do the coding. But decompressing is only marginally slower (the reason that it is slower is that LZW works at the byte level while Huffmann is at the bit level). Now if someone is willing to give it a try, somewhere in my sources I have routines that do Huffmann code table building, compression and decompression. I know nothing about the GIF format. (And I sure hope I spelled Huffmann right :-)) -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl