Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uunet!indetech!vsi1!zorch!xanthian From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) Newsgroups: comp.compression Subject: COMPRESSING of binary data into mailable ASCII Re: Encoding of binary data into mailable ASCII Message-ID: <1991Mar26.024425.5621@zorch.SF-Bay.ORG> Date: 26 Mar 91 02:44:25 GMT References: Distribution: comp Organization: SF-Bay Public-Access Unix Lines: 47 d88-jwa@byse.nada.kth.se (Jon W{tte) writes: > The previously suggested way of using arithmetic coding for > uuencode-style data encoding into printable characters seems cool, but > a simple "base 85" mapping from four bytes to five chars (such as used > by atob) and vice versa does the trick with _very_ close to optimal > performance. The problem is; it requires longword modulo and/or > division, which has to be simulated on weaker processors (such as > 68000 or (yukk !) 8086) Was that mine? It seems like months since I remember putting that claim up. It had a couple of advantages: o arithmetic encoding produces its output data bitwise anyway, so it is almost no extra trouble to capture it in groups of 13 bits instead of 8 or 16 bits. o 2^13 is just a smidgen less than 95*95, which makes close to optimal use of two bytes to encode 13 bits; let's see: 13 4 65 1 65 64 -- : - :: -- : - :: -- : -- 16 5 64 1 80 80 so the encoding of 13 bits into 16 is just a smidgen more efficient than the btoa 32 bits into 40 bits encoding, if I did that right, but is a bit more work _except_ in the case where arithmetic encoding or other bit-by-bit compression output is already being done anyway. I'd love to see someone (me if I were in my right mind) take lharc and replace the Huffman coding step by an arithmetic encoding step; if it were at least as efficient an overall compression method (which seems gut-level likely, since Huffman coding loses efficiency compared to arithmetic compression by having to output an even number of bits per compressed input symbol), the nice bit by bit output would quickly seduce one into forming a command line flag option to make the output mailable ASCII, instead of the default binary, which would then replace the current "compress then encode for mailing" paradigm by a single step, integrated solution. Kent, the man from xanth. -- Does btoa really only use base _eighty_-five? Doesn't look like it could and still be that close to my a.e. solution in efficiency. A typo, perhaps?