Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!batcomputer!cornell!rochester!pt.cs.cmu.edu!g.gp.cs.cmu.edu!tgl From: tgl@g.gp.cs.cmu.edu (Tom Lane) Newsgroups: comp.compression Subject: Re: COMPRESSING of binary data into mailable ASCII Summary: you can't use all ninety-five ASCII symbols Message-ID: <12485@pt.cs.cmu.edu> Date: 26 Mar 91 15:32:30 GMT References: <1991Mar26.024425.5621@zorch.SF-Bay.ORG> Organization: Carnegie-Mellon University, CS/RI Lines: 21 There's a vital point that several posters in this thread seem to have missed. If you want a representation that is safe to mail around the world, *you can't use all ninety-five ASCII symbols*. Some mailers translate mail into other character sets (can you say EBCDIC?). Some of the less common ASCII symbols do not appear in EBCDIC, and unfortunately there isn't a standardized translation. The authors of btoa did the necessary research and concluded that there are eighty-five ASCII symbols that are safe to mail. So that's why btoa uses base 85. Any practical new scheme for mailable representations is not going to be much better. Incidentally, by my calculation, 85 symbols give you log2(85) = 6.41 bits per symbol, where 95 give you 6.57. So we are only talking a three percent difference in size anyway. (I have no idea whether btoa reaches the full 6.41 bits per symbol. More likely they translate four source bytes into five output symbols, or 6.4 bits/symbol.) -- tom lane Internet: tgl@cs.cmu.edu BITNET: tgl%cs.cmu.edu@cmuccvma