Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!mailrus!tut.cis.ohio-state.edu!bloom-beacon!gatech!udel!rochester!pt.cs.cmu.edu!ZOG.CS.CMU.EDU!tgl From: tgl@ZOG.CS.CMU.EDU (Tom Lane) Newsgroups: comp.dcom.modems Subject: Re: Merits of compression Message-ID: <1529@pt.cs.cmu.edu> Date: 26 Apr 88 13:15:06 GMT References: <580@spectrix.UUCP> <406@dragos.UUCP> <1473@pt.cs.cmu.edu> Sender: netnews@pt.cs.cmu.edu Organization: Carnegie-Mellon University, CS/RI Lines: 47 Summary: It's trickier than it looks. In article <580@spectrix.UUCP>, John_M@spectrix.UUCP (John Macdonald) writes: > There is a good solution to this! For each chunk, do the compression. > If the result is smaller, use, otherwise send the original. Precede > this choice with a one-bit flag to show whether the next chunk is compressed. > [...] In both cases, you > update the tables as if you did the compression. > > In the receiver, you read and check the flag. If it indicates compression, > then uncompress as usual. If not, *COMPRESS* the incoming data to keep > the tables up-to-date, but keep the uncompressed data to pass on and throw > away the compressed data. Yeah, that's the obvious thing to do. The trouble is that compression and decompression tables are not the same thing; if you think about it, they perform reverse functions, so it's not surprising that they need to be indexed differently. For LZW compression like the TrailBlazer uses, the compression table is usually a hash table indexed by ; while the decompression table can be a simple array indexed by . I haven't come up with any better way of doing this idea than having the receiver maintain BOTH compression and decompression tables, updating both tables whenever a packet is received (using either the compression or decompression algorithm, as required). This is a nontrivial hit in CPU terms, but the place where it really hurts you is memory space. A compression table for 12-bit-symbol LZW (TB algorithm) is 24KB or more, depending on what kind of hash table load factor you're willing to tolerate (= more CPU hit...). A TrailBlazer hasn't got that kind of memory to spare, I would imagine. On the other hand, if you are talking compression/decompression in the mainframe rather than the modem, and you've got oodles of memory space and CPU cycles to play with, you could do it. I guess that's another argument for my original point (which was that the modem is the wrong place to do the compression). Anybody know anything about cheap ways to build doubly-indexed tables?? You could become the next initial in LZW! PS: to really deserve that initial you should also figure out a really good yet cheap algorithm for deciding when to flush the tables... I seem to recall some discussion of that in the COMPRESS mailing list, but I don't think it was ever solved satisfactorily. -- tom lane Internet: tgl@zog.cs.cmu.edu UUCP: !zog.cs.cmu.edu!tgl BITNET: tgl%zog.cs.cmu.edu@cmuccvma