Xref: utzoo alt.sources.d:1240 comp.sources.d:6212 Path: utzoo!utgpu!cs.utexas.edu!uunet!bfmny0!tneff From: tneff@bfmny0.BFM.COM (Tom Neff) Newsgroups: alt.sources.d,comp.sources.d Subject: Re: Read this if you're having trouble unpacking Tcl Message-ID: <97261639@bfmny0.BFM.COM> Date: 27 Dec 90 09:03:10 GMT References: <7372@sugar.hackercorp.com> <88817429@bfmny0.BFM.COM> <1990Dec27.071632.7272@zorch.SF-Bay.ORG> Reply-To: tneff@bfmny0.BFM.COM (Tom Neff) Followup-To: alt.sources.d Distribution: alt Lines: 152 In article <1990Dec27.071632.7272@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >tneff@bfmny0.BFM.COM (Tom Neff) writes: >>If people would just post source to the source newsgroups, instead of >>this unreadable binary crap, no help file would be necessary. > >Well, "unreadable" is a bit much, Karl was very helpful in email helping >me find the tools to unpack tcl. Karl is a great guy, and this argument isn't intended to impugn his character in any way. By 'unreadable' I mean 'you cannot read it,' not 'you cannot somehow decode or transform it into something readable.' The following is readable: ------------------------------------------------------------ News is for text, and source newsgroups are for source text. ------------------------------------------------------------ The following is unreadable gibberish: ------------------------------------------------------------ begin 0 gibberish M'YV03LK%&X)PSThe packaging was justified I think by the more than 50% savings in the >size of the compressed, uuencoded file over the uncompressed original; >tcl unpacks into nearly 1200 1K blocks of files. This savings is illusory for most of Usenet. The final gibberish articles may occupy less space by themselves in the news spool directory than their appropriate, readable cleartext counterparts would, but that's all. Anyone with a compressed news feed (that's most of us) spent MORE, not less time, receiving them, as benchmarks published here have repeatedly shown. Anyone with a UNIX, DOS or VMS system will spend MORE, not less, disk space on intermediate files and such to do all the concatenating, decoding and decompressing necessary to turn the gibberish into real information than they would have simply extracting text from a shar. >Lots of software doesn't transit the news system well in source form, even >in shars; the extra long lines promoted by both C and awk programming >styles, embedded control characters in the clear text version, and transit >between EBCDIC and ASCII hosts can all cause unencoded files to be damaged >by software problems in the news software (and one must be careful in the >choice of uuencodes to survive the third danger intact). Sure, the phone book or a complete core dump of my system wouldn't transmit well over Usenet either. There is an issue of appropriateness here. Not EVERY piece of software in any form whatsoever, however willfully neglectful of net.user convenience, ought automatically to be considered suitable for posting to source newsgroups. Specifically, IF YOU CODE something you intend to post to the net, DON'T use superlong lines!! So what if C allows it, or even (as one might manage to convince oneself) 'promotes' it? The proliferation of net host environments DISCOURAGES it, and that ought to be the overriding consideration for software which aspires to worldwide distribution. > As the net becomes >wider and the gateways more diverse, naked or shar-ed source has less and >less chance of arriving intact, so probably more and more source files will >transit the net in compressed encoded form as time goes on. No sense getting >abusive about that. Leave 'abusive' out of it for a minute -- I am standing up for a principle, and owe it nothing less than my strongest advocacy. Nobody's a bad guy here. Next case. Yes, the net is more diverse, but resorting to various Captain Midnight coder-ring techniques to try and assure an 'intact' final file is a pyrrhic triumph! The transformations that news gateways perform have a PURPOSE, remember. Joe has a Macintosh system where his C source files all have little binary headers up front and a bunch of ^M-delimited text lines followed by binary \0's to fill out a sector boundary. (Just an example, not necessarily real.) Janet has an IBM CMS system where all her C source files are fixed-length-80 EBCDIC records. The news gateway between Joe and Janet's systems automatically transforms article text lines from one format to the other, so that both of them can read each other's cleartext happily. But now Joe gets this idea that the REALLY cool way to distribute source is as compressed uuencoded stuffit files which carefully 'preserve' all that precious, delicate Mac text structure -- after all, that's how his Mac BBS buddies prefer to exchange stuff -- so he publishes his new hello-world program, HELLO.C to comp.sources.foo as gibberish. After mucho email exchange, Janet gets hold of something to uncompress and uudecode the posting. What does she end up with? Another binary file with little ^M's sprinkled through it! The faithfully 'intact' Mac format is garbage to her! She now has to find or write something else to make into a real source file. Some convenience. The point is that an overly fussy attention to retaining the precise bitstream that appeared on the author's computer is MISPLACED. Material for multi-host distribution should be text. The precise file structure of that text should be permitted to vary normally from host to host. Where the material to be distributed has been written so as to make platform-independent representation impossible, the content should be questioned first! Somebody posted something here recently consisting of source code and documentation; the usual thing. But they put EPSON control codes in the doc file!! Little ESC-this and ESC-that scattered all over it for bold, underline etc. I mean, really now. Of course some host along the way obligingly stripped the \033 characters, leaving weird letters glued to the subheads. When this was pointed out in sources.d, what did someone suggest? You got it -- let's compress and uuencode the doc files, or heck, the whole thing!! BULL. Don't put silly escape sequences in your doc files in the first place! Use a portable representation like 'nroff -man' or stick to plaintext. Usenet is not an Epson peripheral. >Remember, almost nowhere on the net do the *.sources.* files arrive >without having been compressed somewhere along the way; seeing them >delivered to you in a compressed format merely defers the final >unpacking to you, at some cost in convenience but benefit in size and >robustness of transport. This would be true if the news batchers, recognizing that a particular set of articles was already compressed, could say 'OK skip this one, no compression needed.' But that's not how it works. All news articles are compressed on their way to you. Gibberish articles are compressed TWICE, for a net gain in transmission size and delay. Delivering gibberish articles doesn't *defer* unpacking to you: it *adds* another layer of unpacking which you must do. Nor is this more robust: a gibberish article in your spool directory is no likelier to be undamaged than a plaintext article. The difference is that when unpacking a plaintext shar (with word or line counting), you will discover than you have a bad "arraydef.h" and you must fix it yourself (often possible from context) or get a short replacement file or patch from the author; whereas a dropped or mangled line in a uuencoded compressed gibberish article brings everything to a screeching halt while you wait for tens or hundreds of K to be reposted before you can unpack. > No one was going to eyeball that whole >1.2Mbytes plus packaging before deciding whether to save it off and >unpack it in any case, and Karl did provide an introduction of sorts to >the software's purpose. It isn't necessary to 'eyeball' the whole 1.2MB in order to decide whether you want it. One can eyeball the first few components, or search for something specific you want or dislike (socket calls, VMS references, the regexp parser, etc). One can just scan to get a general idea of the code quality. These are real considerations. Perverting the source groups with encoded gibberish ignores them. It is wrong to piggyback an alien distribution scheme onto Usenet. -- "It has come to my attention that there is more !!! Tom Neff than one Jeffrey Miller." -- Jeffrey Miller ! ! tneff@bfmny0.BFM.COM