Path: utzoo!mnetor!uunet!ncc!alberta!edson!tic!dragos!work
From: work@dragos.UUCP (Dragos Ruiu)
Newsgroups: comp.dcom.modems
Subject: Re: Merits of compression
Message-ID: <410@dragos.UUCP>
Date: 27 Apr 88 21:33:58 GMT
References: <580@spectrix.UUCP> <406@dragos.UUCP> <1473@pt.cs.cmu.edu> <1529@pt.cs.cmu.edu>
Organization: UNIX Device, Canada
Lines: 57
Summary: You can do 12 bit LZW with just 32K

In article <1529@pt.cs.cmu.edu>, tgl@ZOG.CS.CMU.EDU (Tom Lane) writes:
> In article <580@spectrix.UUCP>, John_M@spectrix.UUCP (John Macdonald) writes:
> [Talks about compressing incoming uncompressed data to keep LZW tables good]

> Yeah, that's the obvious thing to do.  The trouble is that compression
> and decompression tables are not the same thing; if you think about it,
> they perform reverse functions, so it's not surprising that they need
> to be indexed differently.  For LZW compression like the TrailBlazer
> uses, the compression table is usually a hash table indexed by
> <previous symbol, current character>; while the decompression table can
> be a simple array indexed by <current symbol>.
> 
> I haven't come up with any better way of doing this idea than having
> the receiver maintain BOTH compression and decompression tables,
> updating both tables whenever a packet is received (using either the
> compression or decompression algorithm, as required).  This is a
> nontrivial hit in CPU terms, but the place where it really hurts you is
> memory space.  A compression table for 12-bit-symbol LZW (TB algorithm)
> is 24KB or more, depending on what kind of hash table load factor
> you're willing to tolerate (= more CPU hit...).  A TrailBlazer hasn't
> got that kind of memory to spare, I would imagine.
> 
> -- 
> 				tom lane
> Internet: tgl@zog.cs.cmu.edu
> UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
> BITNET: tgl%zog.cs.cmu.edu@cmuccvma

 First a couple of points:
   -When decide to turn compression off, keeping the tables updated is
   probably of little significance, because the codes being built are
   useless.

   -Last I heard the Telebit was reputed to have 512K of ram.

 You can keep a compression table and a decompression table simultaneously.
 Simply have the hash table indicate entries in the (de)compression table
 instead of keeping the code and the follow. (This saves you memory even
 if you don't care about keeping a decompression table.) The disadvantage
 of this approach is that you take an extra indirection hit on every hash
 table lookup. The hash table gives you the entry number, which you must 
 then look up in the entry table.

 A minimal memory setup for bi-directional compression is about 32K:
  
     10K decompression table:  4096 * (1.5 byte code + 1 byte follow)
     10K compression entry table: "                                "
     12K hash table: 8096 * 1.5 byte compression table entry numbers.

 This has a 50% hash table occupancy.

 And if you wanted to do 'compression' on the decompressed input like
 John MacDonald suggested, you have to add another 12K hash table that
 points to entries in the decompression table.
-- 
Dragos Ruiu   ruiu@dragos.UUCP    "One can ceratinly imagine the myriad of
        ...alberta!dragos!ruiu     uses for a hand-held iguana maker." -Hobbes