Path: utzoo!yunexus!spectrix!John_M
From: John_M@spectrix.UUCP (John Macdonald)
Newsgroups: comp.dcom.modems
Subject: Re: Merits of compression
Message-ID: <580@spectrix.UUCP>
Date: 25 Apr 88 22:29:38 GMT
Article-I.D.: spectrix.580
Posted: Mon Apr 25 18:29:38 1988
References: <8804110136.AA16978@ucbvax.Berkeley.EDU> <15612@onfcanim.UUCP> <1473@pt.cs.cmu.edu> <406@dragos.UUCP>
Reply-To: jmm@spectrix.UUCP (John Macdonald)
Organization: Spectrix Microsystems Inc., Toronto, Ontario, Canada
Lines: 92

In article <406@dragos.UUCP> work@dragos.UUCP (Dragos Ruiu) writes:
:In article <1473@pt.cs.cmu.edu>, tgl@ZOG.CS.CMU.EDU (Tom Lane) writes:
:> 
:> One of the few bad features of LZW is that it is very hard to turn it on and
:> off "on the fly" like this.  The problem is that the compression and
:> decompression symbol tables must be kept in sync, or decompression will be
:> impossible.  Compressing a chunk of text nearly always gives rise to new
:> entries in the table, so you cannot easily compress a packet and then decide
:> not to send it in compressed form.  (The decompressor must be run against
:> the compressed packet in order to update its own table.) 
:[edit]
:> A simple way of doing it would be to flush both tables whenever you decided
:> that compression should be temporarily disabled.  You then couldn't turn
:> compression back on until you had seen some number N of packets that showed
:> adequate compression *starting from an empty table*.  I suspect that this
:> would hurt your overall results badly, but I don't have any data to prove it.
:> Anybody know of any actual research that's been done along this line?
:
:  From Welch's original paper (IEEE Computer June 1984), he says that LZW
:  has a 5K adaptation zone (he was using fixed 12 bit codes). My personal
:  experience has been that this is a conservative estimate. On my compressed
:  packet machine experiment we found that on the average it took about 1.5-3K
:  before compression started to show a win with an empty table. This surprised
:  us, but we didn't have time to research it thoroughly. So these are all
:  ballpark guestimates.  This was using a fixed 12 bit code with the average
:  files kicking around a Un*x system. A variable bit implementation should
:  find the 'win zone' much quicker.
:   
:  If you flushed the tables when compression started going awry, you would
:  would keep compressing but ignoring the compressed output. When the 
:  compression ratio goes back up again to something you like, you flush
:  the tables again and tell the other side to start decompressing once more.
:  Using Welch's 5K figure you lose 5K to find out the initial compression
:  rate has gone back to an acceptable level and you lose another 5 K before
:  your compression adapts back to a good level. So in this worst case
:  scenario you send 10K of data that could have been compressed. I think
:  that real world use would see you lose a max of about 4K of data that could
:  have been compressed.
:   
:  On the plus side though, you gain automagic compression shutoff, and the
:  benefits in throughput when your data stream sometimes contains
:  pre-compressed data is a *big* win. To get this win you would have to
:  keep track of the compression ratios of the last N packets, and only
:  turn off compression when a history of badly compressed packets shows
:  up. If your history window is too small, you may take too many 4K 'hits'
:  to make your compression less effective. But when a xxxK news batch
:  goes through, the modem is going to have many chances to realize
:  compression is not a good idea.
:  
:  I'll agree with Tom that perhaps asking this of Telebit might be much, as 
:  the problem is more difficult than the off the cuff analysis would indicate.
:  Maybe they can do it later.... (And maybe they can do V.32 and FAX and
:  maybe the Usenet deal will come back and maybe I'll someday afford one
:  :-)
:
:> 				tom lane
:> Internet: tgl@zog.cs.cmu.edu
:> UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
:> BITNET: tgl%zog.cs.cmu.edu@cmuccvma
:
:
:-- 
:Dragos Ruiu   ruiu@dragos.UUCP
:        ...alberta!dragos!ruiu   "cat ansi.c | grep -v noalias >proper.c"


There is a good solution to this!  For each chunk, do the compression.
If the result is smaller, use, otherwise send the original. Precede
this choice with a one-bit flag to show whether the next chunk is compressed.
(In a modem you can get away with sending a one-bit flag, but even if that's
too much bother, a whole byte isn't very expensive.)  In both cases, you
update the tables as if you did the compression.

In the receiver, you read and check the flag.  If it indicates compression,
then uncompress as usual.  If not, *COMPRESS* the incoming data to keep
the tables up-to-date, but keep the uncompressed data to pass on and throw
away the compressed data.

For example:

original text:  ABCD EFGH

compress(ABCD,T1) = (MNO,T2),  compress(EFGH,T2) = (PQRST,T3)

source sends:   yMNO nEFGH    (y and n are the flags)

receiver get this and does:
uncompress(MNO,T1) = (ABCD,T2), compress_and_ignore(EFGH,T2) = (EFGH,T3)

receiver outputs: ABCD EFGH
-- 
John Macdonald   UUCP:    {mnetor,utzoo}             !spectrix!jmm