Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!unmvax!pprg.unm.edu!hc!lll-winken!uunet!rosevax!ernie.Rosemount.COM!merlyn From: merlyn@ernie.Rosemount.COM (Brian Westley) Newsgroups: comp.sys.mac Subject: compression thoughts Message-ID: <7390@rosevax.Rosemount.COM> Date: 16 Mar 89 15:21:34 GMT Sender: news@rosevax.Rosemount.COM Reply-To: merlyn@ernie.rosemount.com Lines: 28 btoa/atob is better than binhex, and btoa can be improved upon slightly (about 1% smaller). btoa encodes 4 bytes into 5 base-85 digits ('!' to 'u') plus 'x' for end of data, and 'z' for 4 bytes of zero. It also add a newline every 78 chars to keep mailers happy. About 80% of these newlines can be eliminated if, for each line, the rightmost '!' is turned into a newline (unless this is the first character in the line, or the second character and the first is '.'; otherwise the mailers may get confused). When uncoding, any newline which comes before to 79th character is turned into '!'. "newline" would be any sequence of newlines/carriage returns, in case the file has gotten double-spaced, translated, gaps inserted, etc. '!' is chosen because it is zero in base-85, and occurs most frequently. It can be made to appear even more frequently using base-94 ('!' to '}') and use '~' for 4 bytes of zero, and ' ' at the beginning of a line for end of data (mailers may clip trailing spaces, but this is not a trailing space; checksum data follows). Also, put the ascii-unpacking into the compression so it can do both at once. Which is needed for... An auto-unpacking init; it patches the open file routine, and when a file is created which looks like a compressed or compressed & ASCIIfied file, it "monitors" the data written, and starts unpacking any data that looks valid. Files are unpacked automagically as they are download. Neat, huh. If I have time, I'd do it, but there's a good chance I won't have time. ---- Merlyn LeRoy PS: make sure the auto-unpacking init doesn't do it's thing when a file is being compressed (vs. downloaded).