Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 ARLENE 1/1/85; site dalcs.UUCP Path: utzoo!utcsri!utai!garfield!dalcs!silvert From: silvert@dalcs.UUCP (Bill Silvert) Newsgroups: net.micro,net.micro.atari16 Subject: New way to post binaries Message-ID: <2035@dalcs.UUCP> Date: Sat, 18-Oct-86 10:59:09 EDT Article-I.D.: dalcs.2035 Posted: Sat Oct 18 10:59:09 1986 Date-Received: Mon, 20-Oct-86 00:31:24 EDT Distribution: net Organization: Marine Ecology Lab. Lines: 108 Here are some opinions about the problem of posting binaries, along with a draft solution. There should be some discussion on the net before it gets implemented. Sources are no substitute for binaries, since not everyone has the same compiler, or even language, on micros. Binaries have to be encoded as ASCII files. But there is no reason why we have to use uuencode! There are evidently problems with it, and we should feel free to invent an alternate encoding method which avoids the problems with uuencode. These problems, aside from the minor one that uuencode is designed for th Unix environment, are that some characters (such as curly braces {}) do not make it through all nodes unscathed (IBM machines and others with EBCDIC codes appear to be the culprits), and for long files the posting have to be combined in an editor. Another problem is that udecode is a complicated program which a lot of users have trouble getting or rewriting. I propose that we develop an encoding method for microcomputers that meets these requirements: > So simple that users can easily learn the protocol and write their own version of the decoding program. Uudecode is relatively easy to write in C, but gets tricky in languages that do not have low-level bit operations. > Moderately compact, to keep the traffic volume down. > Reasonably good error trapping to check for damaged files. > Convenient to use, preferably not requiring the use of an editor even for multi-part postings. One possibility would be to post hex files, but these are very bulky, at least twice as long as the binary being posted. However, a generalization of posting hex will work -- if we encounter the letter G in a hex file we know it is an error, but we can also adopt the convention that the letters G-Z do not have to be encoded, so that they are represented by one byte in the encoded file instead of two. This can save a lot of space. Based on this, here is my proposal: *** TO ENCODE A FILE *** Read through the file a byte at a time, and classify each byte as follows: >OK, pass through unchanged >TRANSFORM to a single byte >ENCODE as a pair of bytes The encoding I propose is a modified hex, using the letters A-P instead of the usual hex 0-9A-F -- the reason for this is that it is trivial to map this way, e.g., value = char - 'A'. The rest of upper case letters, Q-Z, can be used for error checking and for 1-byte transformations of common non-graphic bytes, such as NULL and NEWLINE. Thus the actual encoding rules could be: >OK includes digits 0-9, lower case alphabet, and punctuation marks. >TRANSFORM \0 -> Q, \r -> R, space -> S, \t -> T, etc. >ENCODE all upper case letters and other characters into modified hex codes, AA to PP. I have done this encoding on a number of files using a crude set of programs that I wrote a while back when I didn't have xmodem working on my net machine and couldn't get uudecode working on my micro -- the files were generally no larger than uuencoded files, often smaller. To avoid very long lines, adopt the convention that white space is ignored, so that you can put in newlines wherever you want (probably not in the middle of a hex pair though). To decode a file, one simply reverses the process. Read through the file a byte at a time, and use switch or a set of ifs to do the following: >letter A-P? Read next byte and output 16*(first-'A') + (second - 'A') >letter Q-Z? Output \0, \r, etc., according to above table. >anything else? Output it as stands. *** REFINEMENTS *** I haven't said anything yet about error checking, convenience, etc. Note that there are several byte combinations that are not used in this scheme of things, specifically a letter A-P followed by Q-Z. These can be used to add these features. For example, an encoded file should begin with the pair AZ and end with PZ, similar to the begin and end lines used by uuencode. However, we could also adopt the convention that when a file is broken into parts, the first part ends with BZ, the next begins with CZ, and so on. This way one could simply decode a set of files without first combining them -- the program would start at the AZ flag, and stop when it found BZ. Then it would go on to the next file and search for CZ, etc. If it didn't find PZ at the end of the last file, or if the codes were out of order, it would complain. Further refinements would be to add various checksums, set off by other unused code pairs. I'll pass on this one, since it sounds like a good idea, but adds to the complication. Perhaps it could be made optional, such as writing a checksum after each termination code like BZ ... PZ. If this idea seems reasonable, perhaps net moderators could carry the ball from here. Unfortunately this site is not very reliable for news and mail.