Path: utzoo!dciem!nrcaer!scs!spl1!laidbak!att!pacbell!ames!ll-xn!mit-eddie!genrad!panda!teddy!jpn From: jpn@teddy.UUCP (John P. Nelson) Newsgroups: comp.sources.d Subject: Re: Poll on shar formats Message-ID: <4838@teddy.UUCP> Date: 3 Jun 88 14:19:10 GMT Article-I.D.: teddy.4838 References: <868@fig.bbn.com> <1494@microsoft.UUCP> Reply-To: jpn@teddy.UUCP (John P. Nelson) Organization: GenRad, Inc., Concord, Mass. Lines: 31 [Discussion of shar prefixing algorithms] Back when I was mod.sources moderator, I modified my version of "shar" to pre-scan each file for "dangerous" character sequences. If no "dangerous" sequences were found, then no prefixes were used at all, and "cat" was used to extract the files, not "sed". This both runs faster, and makes it easier to extract the file when a "dumb" editor is the only available means of extracting the files. In a year and a half of moderating, not one file actually needed prefixing. Of course, I didn't repack any shars that I didn't need to, but I ended up repacking about 1/4 of the submissions. I did not get ANY complaints from people about strange sequences causing corruption of shars. For some reason, people assume that a dot beginning a line is a "dangerous sequence": It is NOT! What they are thinking of is a dot ALONE on a line: This causes some mailers to terminate reading the mail. It is silly to prefix EVERY line starting with a dot (nroff source) because of this. Other dangerous sequences are a line starting with "From", or a line starting with the here-document end-of-file marker. There may be others, but I cannot recall any. Note that a leading 'X' is not dangerous unless you are already using sed 's/^X//' to extract. -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn smail: jpn@genrad.com