Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!ub!uhura.cc.rochester.edu!rochester!pt.cs.cmu.edu!g.gp.cs.cmu.edu!tgl From: tgl@g.gp.cs.cmu.edu (Tom Lane) Newsgroups: comp.compression Subject: Re: COMPRESSING of binary data into mailable ASCII Message-ID: <12489@pt.cs.cmu.edu> Date: 26 Mar 91 20:10:47 GMT References: <12485@pt.cs.cmu.edu> Organization: Carnegie-Mellon University, CS/RI Lines: 33 In article <12485@pt.cs.cmu.edu>, I said: > The authors of btoa did the necessary research and concluded that there are > eighty-five ASCII symbols that are safe to mail. So that's why btoa uses > base 85. After looking at the recently posted source code, it's clear that btoa uses base 85 because that's what you need for 6.4 bits/symbol (four input bytes map to five output characters). 2^6.4 = 84.45, so base 84 is too small. They avoid the ASCII characters above 'z' (codes 123-127), plus they leave five lowercase letters unused. (Actually they use 'x' and 'z' for special purposes: 'x' is the EOF mark and 'z' represents four zero input bytes.) *If* the btoa authors are right about which ASCII characters are safe to mail (and I am no longer willing to take that as certain) then you could theoretically go up to base 90 in a mail-safe representation. But that gets you log2(90) = 6.49 bits/symbol which is hard to use efficiently, and it's only 1% less space than btoa's 6.4 bits/symbol. (In some cases, btoa's use of 'z' to represent four zeros would buy a lot more than 1%.) I believe btoa is pretty close to the optimal mailable representation of arbitrary binary data. You can't possibly do more than about 2.4% better (using 94 ASCII characters), and given practical considerations such as unsafe characters and conversion speed, btoa looks pretty good. However, it'd be nice to hear from someone who uses non-ASCII mailers about just which characters are safe. (Anybody have an EBCDIC character chart handy?) One thing that might be good to avoid is putting '.' at the start of a line, which btoa will do quite happily. -- tom lane Internet: tgl@cs.cmu.edu BITNET: tgl%cs.cmu.edu@cmuccvma