Xref: utzoo alt.sources.d:1240 comp.sources.d:6212
Path: utzoo!utgpu!cs.utexas.edu!uunet!bfmny0!tneff
From: tneff@bfmny0.BFM.COM (Tom Neff)
Newsgroups: alt.sources.d,comp.sources.d
Subject: Re: Read this if you're having trouble unpacking Tcl
Message-ID: <97261639@bfmny0.BFM.COM>
Date: 27 Dec 90 09:03:10 GMT
References: <7372@sugar.hackercorp.com> <88817429@bfmny0.BFM.COM> <1990Dec27.071632.7272@zorch.SF-Bay.ORG>
Reply-To: tneff@bfmny0.BFM.COM (Tom Neff)
Followup-To: alt.sources.d
Distribution: alt
Lines: 152

In article <1990Dec27.071632.7272@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>tneff@bfmny0.BFM.COM (Tom Neff) writes:
>>If people would just post source to the source newsgroups, instead of
>>this unreadable binary crap, no help file would be necessary.
>
>Well, "unreadable" is a bit much, Karl was very helpful in email helping
>me find the tools to unpack tcl.

Karl is a great guy, and this argument isn't intended to impugn his
character in any way.  By 'unreadable' I mean 'you cannot read it,' not
'you cannot somehow decode or transform it into something readable.'

The following is readable:
------------------------------------------------------------
News is for text, and source newsgroups are for source text.
------------------------------------------------------------

The following is unreadable gibberish:
------------------------------------------------------------
begin 0 gibberish
M'YV03LK<F0,B#4$S;^2 H%,&#QT6(,*X(0-BSILZ<L:4 >%&X)PS<B["(1A&
.SD:$"BUBU+BP(1T7"@" 
 
end
------------------------------------------------------------
...even though it's the same sentence uuencoded and compressed.

>The packaging was justified I think by the more than 50% savings in the
>size of the compressed, uuencoded file over the uncompressed original;
>tcl unpacks into nearly 1200 1K blocks of files.

This savings is illusory for most of Usenet.  The final gibberish
articles may occupy less space by themselves in the news spool directory
than their appropriate, readable cleartext counterparts would, but
that's all.  Anyone with a compressed news feed (that's most of us)
spent MORE, not less time, receiving them, as benchmarks published here
have repeatedly shown.  Anyone with a UNIX, DOS or VMS system will spend
MORE, not less, disk space on intermediate files and such to do all the
concatenating, decoding and decompressing necessary to turn the
gibberish into real information than they would have simply extracting
text from a shar.

>Lots of software doesn't transit the news system well in source form, even
>in shars; the extra long lines promoted by both C and awk programming
>styles, embedded control characters in the clear text version, and transit
>between EBCDIC and ASCII hosts can all cause unencoded files to be damaged
>by software problems in the news software (and one must be careful in the
>choice of uuencodes to survive the third danger intact).  

Sure, the phone book or a complete core dump of my system wouldn't
transmit well over Usenet either.  There is an issue of appropriateness
here.  Not EVERY piece of software in any form whatsoever, however
willfully neglectful of net.user convenience, ought automatically to be
considered suitable for posting to source newsgroups.  Specifically, IF
YOU CODE something you intend to post to the net, DON'T use superlong
lines!!  So what if C allows it, or even (as one might manage to
convince oneself) 'promotes' it?  The proliferation of net host
environments DISCOURAGES it, and that ought to be the overriding
consideration for software which aspires to worldwide distribution.

>                                                          As the net becomes
>wider and the gateways more diverse, naked or shar-ed source has less and
>less chance of arriving intact, so probably more and more source files will 
>transit the net in compressed encoded form as time goes on.  No sense getting
>abusive about that.

Leave 'abusive' out of it for a minute -- I am standing up for a
principle, and owe it nothing less than my strongest advocacy.  Nobody's
a bad guy here.  Next case.

Yes, the net is more diverse, but resorting to various Captain Midnight
coder-ring techniques to try and assure an 'intact' final file is a
pyrrhic triumph!  The transformations that news gateways perform have a
PURPOSE, remember.

Joe has a Macintosh system where his C source files all have little
binary headers up front and a bunch of ^M-delimited text lines followed
by binary \0's to fill out a sector boundary.  (Just an example, not
necessarily real.)  Janet has an IBM CMS system where all her C source
files are fixed-length-80 EBCDIC records.  The news gateway between Joe
and Janet's systems automatically transforms article text lines from one
format to the other, so that both of them can read each other's
cleartext happily.  But now Joe gets this idea that the REALLY cool way
to distribute source is as compressed uuencoded stuffit files which
carefully 'preserve' all that precious, delicate Mac text structure --
after all, that's how his Mac BBS buddies prefer to exchange stuff -- so
he publishes his new hello-world program, HELLO.C to comp.sources.foo as
gibberish.  After mucho email exchange, Janet gets hold of something to
uncompress and uudecode the posting.  What does she end up with?
Another binary file with little ^M's sprinkled through it!  The
faithfully 'intact' Mac format is garbage to her!  She now has to find
or write something else to make into a real source file.  Some
convenience.

The point is that an overly fussy attention to retaining the precise
bitstream that appeared on the author's computer is MISPLACED.  Material
for multi-host distribution should be text.  The precise file structure
of that text should be permitted to vary normally from host to host.

Where the material to be distributed has been written so as to make
platform-independent representation impossible, the content should be
questioned first!  Somebody posted something here recently consisting of
source code and documentation; the usual thing.  But they put EPSON
control codes in the doc file!!  Little ESC-this and ESC-that scattered
all over it for bold, underline etc.  I mean, really now.  Of course
some host along the way obligingly stripped the \033 characters, leaving
weird letters glued to the subheads.  When this was pointed out in
sources.d, what did someone suggest?  You got it -- let's compress and
uuencode the doc files, or heck, the whole thing!!  BULL.  Don't put
silly escape sequences in your doc files in the first place!  Use a
portable representation like 'nroff -man' or stick to plaintext.  Usenet
is not an Epson peripheral.

>Remember, almost nowhere on the net do the *.sources.* files arrive
>without having been compressed somewhere along the way; seeing them
>delivered to you in a compressed format merely defers the final
>unpacking to you, at some cost in convenience but benefit in size and
>robustness of transport. 

This would be true if the news batchers, recognizing that a particular
set of articles was already compressed, could say 'OK skip this one, no
compression needed.'  But that's not how it works.  All news articles
are compressed on their way to you.  Gibberish articles are compressed
TWICE, for a net gain in transmission size and delay.  Delivering
gibberish articles doesn't *defer* unpacking to you: it *adds* another
layer of unpacking which you must do.  Nor is this more robust: a
gibberish article in your spool directory is no likelier to be undamaged
than a plaintext article.  The difference is that when unpacking a
plaintext shar (with word or line counting), you will discover than you
have a bad "arraydef.h" and you must fix it yourself (often possible
from context) or get a short replacement file or patch from the author;
whereas a dropped or mangled line in a uuencoded compressed gibberish
article brings everything to a screeching halt while you wait for tens
or hundreds of K to be reposted before you can unpack.

>                           No one was going to eyeball that whole
>1.2Mbytes plus packaging before deciding whether to save it off and
>unpack it in any case, and Karl did provide an introduction of sorts to
>the software's purpose.

It isn't necessary to 'eyeball' the whole 1.2MB in order to decide
whether you want it.  One can eyeball the first few components, or
search for something specific you want or dislike (socket calls, VMS
references, the regexp parser, etc).  One can just scan to get a general
idea of the code quality.  These are real considerations.  Perverting
the source groups with encoded gibberish ignores them.

It is wrong to piggyback an alien distribution scheme onto Usenet.

-- 
"It has come to my attention that there is more  !!!  Tom Neff
than one Jeffrey Miller." -- Jeffrey Miller      ! !  tneff@bfmny0.BFM.COM