Path: utzoo!mnetor!uunet!ncc!alberta!edson!tic!dragos!work From: work@dragos.UUCP (Dragos Ruiu) Newsgroups: comp.dcom.modems Subject: Re: Merits of compression Message-ID: <406@dragos.UUCP> Date: 22 Apr 88 01:39:22 GMT References: <8804110136.AA16978@ucbvax.Berkeley.EDU> <15612@onfcanim.UUCP> <1473@pt.cs.cmu.edu> Organization: UNIX Device, Canada Lines: 63 Summary: You can turn off lzw... In article <1473@pt.cs.cmu.edu>, tgl@ZOG.CS.CMU.EDU (Tom Lane) writes: > > One of the few bad features of LZW is that it is very hard to turn it on and > off "on the fly" like this. The problem is that the compression and > decompression symbol tables must be kept in sync, or decompression will be > impossible. Compressing a chunk of text nearly always gives rise to new > entries in the table, so you cannot easily compress a packet and then decide > not to send it in compressed form. (The decompressor must be run against > the compressed packet in order to update its own table.) [edit] > A simple way of doing it would be to flush both tables whenever you decided > that compression should be temporarily disabled. You then couldn't turn > compression back on until you had seen some number N of packets that showed > adequate compression *starting from an empty table*. I suspect that this > would hurt your overall results badly, but I don't have any data to prove it. > Anybody know of any actual research that's been done along this line? From Welch's original paper (IEEE Computer June 1984), he says that LZW has a 5K adaptation zone (he was using fixed 12 bit codes). My personal experience has been that this is a conservative estimate. On my compressed packet machine experiment we found that on the average it took about 1.5-3K before compression started to show a win with an empty table. This surprised us, but we didn't have time to research it thoroughly. So these are all ballpark guestimates. This was using a fixed 12 bit code with the average files kicking around a Un*x system. A variable bit implementation should find the 'win zone' much quicker. If you flushed the tables when compression started going awry, you would would keep compressing but ignoring the compressed output. When the compression ratio goes back up again to something you like, you flush the tables again and tell the other side to start decompressing once more. Using Welch's 5K figure you lose 5K to find out the initial compression rate has gone back to an acceptable level and you lose another 5 K before your compression adapts back to a good level. So in this worst case scenario you send 10K of data that could have been compressed. I think that real world use would see you lose a max of about 4K of data that could have been compressed. On the plus side though, you gain automagic compression shutoff, and the benefits in throughput when your data stream sometimes contains pre-compressed data is a *big* win. To get this win you would have to keep track of the compression ratios of the last N packets, and only turn off compression when a history of badly compressed packets shows up. If your history window is too small, you may take too many 4K 'hits' to make your compression less effective. But when a xxxK news batch goes through, the modem is going to have many chances to realize compression is not a good idea. I'll agree with Tom that perhaps asking this of Telebit might be much, as the problem is more difficult than the off the cuff analysis would indicate. Maybe they can do it later.... (And maybe they can do V.32 and FAX and maybe the Usenet deal will come back and maybe I'll someday afford one :-) > tom lane > Internet: tgl@zog.cs.cmu.edu > UUCP: !zog.cs.cmu.edu!tgl > BITNET: tgl%zog.cs.cmu.edu@cmuccvma -- Dragos Ruiu ruiu@dragos.UUCP ...alberta!dragos!ruiu "cat ansi.c | grep -v noalias >proper.c"