Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!uunet!looking!brad
From: brad@looking.on.ca (Brad Templeton)
Newsgroups: comp.compression
Subject: Re: Modeling vs encoding (Re: Lempel-Ziv v/s huffman encoding)
Message-ID: <1991Apr01.053616.3665@looking.on.ca>
Date: 1 Apr 91 05:36:16 GMT
References: <KENW.91Mar29153846@skyler.arc.ab.ca> <46466@ut-emx.uucp> <1991Mar31.000653.4598@zorch.SF-Bay.ORG> <12546@pt.cs.cmu.edu>
Organization: Looking Glass Software Ltd.
Lines: 22

In article <12546@pt.cs.cmu.edu> tgl@g.gp.cs.cmu.edu (Tom Lane) writes:
>A semiadaptive modeler prescans the file to be compressed, computes
>statistics *for that file*, and then encodes the file accordingly.  This
>avoids the Swahili problem.  With this method you must store/transmit the
>encoding table along with the data, so it loses for small files (the
>overhead of the table is too great).  Also, in many applications having to
>scan the file twice is bad news; you can't compress a stream of data on the
>fly.  This option is pretty much out of favor these days.

Depends what you mean by "out of favour."   One of the most commonly
used compressors in the world, PKZIP, uses exactly this method, and does
well by it.   It is, in fact, the superior method for medium sized files.

So many compression theorists run around assuming files of arbitrary
length, which does not attack real world problems.

Two-pass systems such as ZIP suffer the speed penalty of the 2nd pass,
but quite often the penalty of maintaining the adaptive trees or using
arithmetic coding can be high.  There are linear time adaptive huffman
algorithms, but this is no good if the time per bit is still high.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473