Path: utzoo!utgpu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!mips!daver!zorch!xanthian From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) Newsgroups: alt.sources.d Subject: Re: A readable, robust encoding for source postings Message-ID: <1990Dec31.232624.23510@zorch.SF-Bay.ORG> Date: 31 Dec 90 23:26:24 GMT References: <1990Dec29.114801.5895@Daisy.EE.UND.AC.ZA> <1990Dec30.170302.21665@druid.uucp> <6540@uqcspe.cs.uq.oz.au> Organization: SF-Bay Public-Access Unix Lines: 99 rhys@batserver.cs.uq.oz.au writes: > darcy@druid.uucp (D'Arcy J.M. Cain) writes: >>In article <1990Dec29.114801.5895@Daisy.EE.UND.AC.ZA> Alan P. Barrett writes: >>> [...] >>>I think that the correct way to fix this is to use an encoding that is >>>both readable and robust. A version of shar that does stuff like >>>encoding tabs as \t and wrapping lines in a reversible way would do it. >>I posted my genfiles program which I hoped would be a jumpimg off point for >>such an effort. Has anyone looked at it and have suggestions to enhance >>the protocols I suggested? >I missed the original discussion, so I may be repeating things, but >the central problem I think there will be in getting a new transmission >standard off the ground is actually making it a standard :-). unshar, >uuencode and the like are very widespread, and trying to shake their >ground may be very hard. Maybe in the interim a cut-down "encoder" >is needed that can be wrapped-up in a shar archive, and will be unpacked, >compiled and run to unpack the rest. e.g. the shar archive could look >something like this: > ... head information ... > sed ... >/tmp/decode.c < ... source code for decode.c ... > EOF > cc -o /tmp/decode /tmp/decode.c > sed ... | /tmp/decode >file < ... file contents ... > EOF >It should be possible to get a very compact decoding program that could >be wrapped up with the shell archives. Won't solve all the problems >but may help, as well as its being reasonably compatible with the >existing shar archiving system. Well, that's my thoughts on the matter, >what do you think? Problem is, lots of shars are unpacked on systems where the C compiler command isn't spelled "cc", lots of shars don't contain C code and may be unpacked on systems where, e.g., Modula-2 is the only compilable language, in fact, I unpack lots of shars on my Amiga, where "sed" doesn't exist, and the "unshar" program fakes it by knowing the format of ordinary shar file "sed" commands and doing what's right. Probably, despite the calls here for clear text, a much more robust way to transmit source files is the one used in, for example, comp.binaries.ibm.pc, where the expected resources at a site are "uudecode", which can be transmitted in clear text as a BASIC or C program, and some widely available archiving program; the one of choice now is zoo, but lharc is coming up fast due to a superior packing algorithm. Add to that the "brik" CRC check, the zoo internal CRC checks, and the short line, limited character set, uuencode format with line by line checksums, and you have an extremely robust encoding that can transit ASCII to EBCDIC to ASCII intact, and doesn't challenge developmentally disabled news software, which we will always have with us. The major requirement for this method is that there needs to be a very explicit clear text explanation of the purpose and contents of the archive to let the reader make a decision whether it is worth unpacking. I'm not thrilled when I take the time to unpack and catenate and uudecode an archive with an interesting description from the PC-clone universe, to find out that it doesn't contain the source code I was seeking/expecting; in hopes of stealing some code and ideas for a port of the functionality. A minimal description should include source or not, data types, platforms, compiler technology required, functionality, and copyright status. To another poster's comments that folks on EBCDIC systems have to solve their own character set and newline encoding problems, that misses the point. Lots of ASCII to ASCII routings these days arrive with a BITNET host as an intermediary, so even the ASCII destination sites have to be concerned about the problem of an encoding that can survive the transit. I think the current pleas to keep the comp.sources.{unix,games,misc} and alt.sources postings all clear text, while understandable, are misdirected on today's net. And, again to another posting, no, the world is not all becoming USENet, to live under our way of doing things, just because the nets are being gatewayed together and sharing code in a much larger universe. The greater net is a community of peer networks, each with its own peculiar needs and requirements, not a set of subordinates to the least organized and most contentious member of the set, USENet. Thus it behooves us to find methods that cause as few problems as possible in getting code across this wider universe of communication, and clear text transmission doesn't seem to be the appropriate technique anymore. In my opinion, but I pack and unpack a _lot_ of source; .6 gigabytes compressed, at last count, not bad for a personal archive. That translates into several thousand archives of various sorts that I've unpacked. Kent, the man from xanth.