Xref: utzoo comp.theory:1527 gnu.misc.discuss:2393 sci.crypt:4176 sci.misc:4798
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!dali.cs.montana.edu!ogicse!hsdndev!cmcl2!uupsi!sunic!fuug!demos!news-server
From: Serge.Volkoff@ippi.msk.su (Serge Volkoff at IPPI Moscow USSR)
Newsgroups: comp.theory,gnu.misc.discuss,sci.crypt,sci.misc
Subject: Re: CALL FOR DISCUSSION: Create sci.compression?
Keywords: sci.compression data compression new newsgroup creation
Message-ID: <9102120717.AA17688@jumbo.hq.demos.su>
Date: 12 Feb 91 06:32:58 GMT
Sender: news-server@jumbo.hq.demos.su
Reply-To: Serge.Volkoff@ippi.msk.su
Followup-To: news.groups
Organization: unknown
Lines: 39

In <1991Feb11.004336.26106@rand.org> Ed Hall <edhall@rand.org> writes:

>Data compression is meaningless outside of a computer context; it is
>arguably as much a part of computer science as compiler writing or
>microprocessor design.  Although little practical cryptography is
>done without computers these days, for historical reasons
>cryptography is viewed as having an existance outside of the aegis
>of computer science.

Let me note that data compression as part of information and coding theory is
generally called `source modeling and coding.' Surely, it's viewed by many
people as a purely computer-science topic, but as originated by Shannon
information-theoretic works and continued by such brilliant scientists as
David Huffman, Robert Gallager, Abraham Lempel, Jacob Ziv, Jorma Rissanen,
and Glen Langdon, it is certainly part of information theory. Just open `IEEE
Transactions on Information Theory' and find in each issue at least one paper
devoted to source coding. Source coding isn't as old as cryptography, but at 
least it appeared earlier than most of the computer-science topics.

I wonder why so weak methods are always selected for practical archivers. 
Isn't it because of ignorance of theoretical results in source coding. All 
archivers known to me are based on numerous modifications of two Lempel-Ziv 
algorithms (LZ77 and LZ78). As source coding analysis shows, these are only
asymptotically optimal, and the coding rate (the inverse of compression ratio) 
approaches the entropy extremely slowly. Very powerful universal modeling and 
coding methods exist (due to Rissanen and Langdon, Cleary and Witten, et al.), 
primarily those based on variable-context Markov models and arithmetic coding 
technique, a purely theoretical achievement of Jorma Rissanen. These methods 
are several times slower but they achieve nice compression ratios for _all_ 
files, often _far_ beyond the current limits set by the best existing 
programs: compress, pkarc, pkzip, lharc, and arj.

I vote YES to sci.compression. This would be exciting to have an opportunity 
to discuss this topic with both software designers and researchers working in 
source coding and computer science.
 __                            _ _
(__` _  _ _  _   |  | _ ||, _ |_|_
.__)(-'| (_)(-'   \/ (_)||\(_)| |
         ._)