Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 ARLENE 1/1/85; site dalcs.UUCP
Path: utzoo!utcsri!utai!garfield!dalcs!silvert
From: silvert@dalcs.UUCP (Bill Silvert)
Newsgroups: net.micro,net.micro.atari16
Subject: New way to post binaries
Message-ID: <2035@dalcs.UUCP>
Date: Sat, 18-Oct-86 10:59:09 EDT
Article-I.D.: dalcs.2035
Posted: Sat Oct 18 10:59:09 1986
Date-Received: Mon, 20-Oct-86 00:31:24 EDT
Distribution: net
Organization: Marine Ecology Lab.
Lines: 108

Here are some opinions about the problem of posting binaries, along with
a draft solution.  There should be some discussion on the net before it
gets implemented.

Sources are no substitute for binaries, since not everyone has the same
compiler, or even language, on micros.

Binaries have to be encoded as ASCII files.  But there is no reason why
we have to use uuencode!  There are evidently problems with it, and we
should feel free to invent an alternate encoding method which avoids the
problems with uuencode.  These problems, aside from the minor one that
uuencode is designed for th Unix environment, are that some characters
(such as curly braces {}) do not make it through all nodes unscathed
(IBM machines and others with EBCDIC codes appear to be the culprits),
and for long files the posting have to be combined in an editor.
Another problem is that udecode is a complicated program which a lot of
users have trouble getting or rewriting.

I propose that we develop an encoding method for microcomputers that
meets these requirements:

> So simple that users can easily learn the protocol and write their own
version of the decoding program.  Uudecode is relatively easy to write
in C, but gets tricky in languages that do not have low-level bit
operations.

> Moderately compact, to keep the traffic volume down.

> Reasonably good error trapping to check for damaged files.

> Convenient to use, preferably not requiring the use of an editor even
for multi-part postings.

One possibility would be to post hex files, but these are very bulky, at
least twice as long as the binary being posted.  However, a
generalization of posting hex will work -- if we encounter the letter G
in a hex file we know it is an error, but we can also adopt the
convention that the letters G-Z do not have to be encoded, so that they
are represented by one byte in the encoded file instead of two.  This
can save a lot of space.  Based on this, here is my proposal:

	*** TO ENCODE A FILE ***

Read through the file a byte at a time, and classify each byte as
follows:

>OK, pass through unchanged

>TRANSFORM to a single byte

>ENCODE as a pair of bytes

The encoding I propose is a modified hex, using the letters A-P instead
of the usual hex 0-9A-F -- the reason for this is that it is trivial to
map this way, e.g., value = char - 'A'.  The rest of upper case letters,
Q-Z, can be used for error checking and for 1-byte transformations of
common non-graphic bytes, such as NULL and NEWLINE.  Thus the actual
encoding rules could be:

>OK includes digits 0-9, lower case alphabet, and punctuation marks.

>TRANSFORM \0 -> Q, \r -> R, space -> S, \t -> T, etc.

>ENCODE all upper case letters and other characters into modified hex
codes, AA to PP.

I have done this encoding on a number of files using a crude set of
programs that I wrote a while back when I didn't have xmodem working on
my net machine and couldn't get uudecode working on my micro -- the
files were generally no larger than uuencoded files, often smaller.

To avoid very long lines, adopt the convention that white space is
ignored, so that you can put in newlines wherever you want (probably not
in the middle of a hex pair though).

To decode a file, one simply reverses the process.  Read through the
file a byte at a time, and use switch or a set of ifs to do the
following:

>letter A-P?  Read next byte and output 16*(first-'A') + (second - 'A')

>letter Q-Z?  Output \0, \r, etc., according to above table.

>anything else?  Output it as stands.

	*** REFINEMENTS ***

I haven't said anything yet about error checking, convenience, etc.
Note that there are several byte combinations that are not used in this
scheme of things, specifically a letter A-P followed by Q-Z.  These can
be used to add these features.  For example, an encoded file should
begin with the pair AZ and end with PZ, similar to the begin and end
lines used by uuencode.  However, we could also adopt the convention
that when a file is broken into parts, the first part ends with BZ, the
next begins with CZ, and so on.  This way one could simply decode a set
of files without first combining them -- the program would start at the
AZ flag, and stop when it found BZ.  Then it would go on to the next
file and search for CZ, etc.  If it didn't find PZ at the end of the
last file, or if the codes were out of order, it would complain.

Further refinements would be to add various checksums, set off by other
unused code pairs.  I'll pass on this one, since it sounds like a good
idea, but adds to the complication.  Perhaps it could be made optional,
such as writing a checksum after each termination code like BZ ... PZ.

If this idea seems reasonable, perhaps net moderators could carry the
ball from here.  Unfortunately this site is not very reliable for news
and mail.