Path: utzoo!mnetor!uunet!ncc!alberta!edson!tic!dragos!work
From: work@dragos.UUCP (Dragos Ruiu)
Newsgroups: comp.dcom.modems
Subject: Re: Merits of compression
Message-ID: <406@dragos.UUCP>
Date: 22 Apr 88 01:39:22 GMT
References: <8804110136.AA16978@ucbvax.Berkeley.EDU> <15612@onfcanim.UUCP> <1473@pt.cs.cmu.edu>
Organization: UNIX Device, Canada
Lines: 63
Summary: You can turn off lzw...

In article <1473@pt.cs.cmu.edu>, tgl@ZOG.CS.CMU.EDU (Tom Lane) writes:
> 
> One of the few bad features of LZW is that it is very hard to turn it on and
> off "on the fly" like this.  The problem is that the compression and
> decompression symbol tables must be kept in sync, or decompression will be
> impossible.  Compressing a chunk of text nearly always gives rise to new
> entries in the table, so you cannot easily compress a packet and then decide
> not to send it in compressed form.  (The decompressor must be run against
> the compressed packet in order to update its own table.) 
[edit]
> A simple way of doing it would be to flush both tables whenever you decided
> that compression should be temporarily disabled.  You then couldn't turn
> compression back on until you had seen some number N of packets that showed
> adequate compression *starting from an empty table*.  I suspect that this
> would hurt your overall results badly, but I don't have any data to prove it.
> Anybody know of any actual research that's been done along this line?

  From Welch's original paper (IEEE Computer June 1984), he says that LZW
  has a 5K adaptation zone (he was using fixed 12 bit codes). My personal
  experience has been that this is a conservative estimate. On my compressed
  packet machine experiment we found that on the average it took about 1.5-3K
  before compression started to show a win with an empty table. This surprised
  us, but we didn't have time to research it thoroughly. So these are all
  ballpark guestimates.  This was using a fixed 12 bit code with the average
  files kicking around a Un*x system. A variable bit implementation should
  find the 'win zone' much quicker.
   
  If you flushed the tables when compression started going awry, you would
  would keep compressing but ignoring the compressed output. When the 
  compression ratio goes back up again to something you like, you flush
  the tables again and tell the other side to start decompressing once more.
  Using Welch's 5K figure you lose 5K to find out the initial compression
  rate has gone back to an acceptable level and you lose another 5 K before
  your compression adapts back to a good level. So in this worst case
  scenario you send 10K of data that could have been compressed. I think
  that real world use would see you lose a max of about 4K of data that could
  have been compressed.
   
  On the plus side though, you gain automagic compression shutoff, and the
  benefits in throughput when your data stream sometimes contains
  pre-compressed data is a *big* win. To get this win you would have to
  keep track of the compression ratios of the last N packets, and only
  turn off compression when a history of badly compressed packets shows
  up. If your history window is too small, you may take too many 4K 'hits'
  to make your compression less effective. But when a xxxK news batch
  goes through, the modem is going to have many chances to realize
  compression is not a good idea.
  
  I'll agree with Tom that perhaps asking this of Telebit might be much, as 
  the problem is more difficult than the off the cuff analysis would indicate.
  Maybe they can do it later.... (And maybe they can do V.32 and FAX and
  maybe the Usenet deal will come back and maybe I'll someday afford one
  :-)

> 				tom lane
> Internet: tgl@zog.cs.cmu.edu
> UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
> BITNET: tgl%zog.cs.cmu.edu@cmuccvma


-- 
Dragos Ruiu   ruiu@dragos.UUCP
        ...alberta!dragos!ruiu   "cat ansi.c | grep -v noalias >proper.c"