Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!zaphod.mps.ohio-state.edu!mips!bridge2!jarthur!uci-ics!ucla-cs!wales
From: wales@valeria.cs.ucla.edu (Rich Wales)
Newsgroups: comp.binaries.ibm.pc.d
Subject: Re: which archiver/compresser and encoder/decoder to use?
Message-ID: <32208@shemp.CS.UCLA.EDU>
Date: 23 Feb 90 00:06:18 GMT
References: <522.25E37FEB@blkcat.fidonet.org>
Sender: news@CS.UCLA.EDU
Reply-To: wales@CS.UCLA.EDU (Rich Wales)
Organization: UCLA CS Department, Los Angeles
Lines: 48

In article <522.25E37FEB@blkcat.fidonet.org>
Jerry.Andrews@f426.n109.z1.fidonet.org (Jerry Andrews) writes:

	 RW> whereas UUENCODE uses the following character output
	 RW> function --
	 RW> 
	 RW> #define ENC(c) (((c) & 077) + ' ')

	How do I apply this funtion to the source file to get an output
	file?  Looks  to me like, if this function is applied to each
	byte of the source file, you'd  wipe all the hi bits, and not
	end up with a complete set of printable characters anyway...

I wrote my article with the underlying assumption that the people who
would read it had a thorough understanding of how UUENCODE works.
Since this assumption may not have been fair, let me elaborate.

UUENCODE takes sets of three 8-bit bytes, rearranges the bits into four
6-bit groups, and translates each of these 6-bit groups into a printa-
ble ASCII character.

Additionally, each line of output from UUENCODE starts with a one-byte
count of the number of 8-bit bytes encoded in that line.  The reason
most lines in a UUENCODE file start with the capital letter "M" is that
this letter corresponds to the number 45 in UUENCODE's scheme -- and
most of the lines in a UUENCODE file contain 61 printable characters
(the initial "M", plus 60 characters that correspond to 45 8-bit bytes).

The "character output function" I described in my article is what UUEN-
CODE uses to translate its 6-bit groups into printable characters.  I
should probably have added that some UUENCODE's output a grave accent
(`) instead of a space -- as protection against some mail systems that
strip trailing blanks from lines in messages.

The only real difference between UUENCODE and XXENCODE is in the func-
tion that translates 6-bit groups into printable characters.  Some of
the characters used by UUENCODE (punctuation marks such as the square
brackets, curly braces, and backslash) get destroyed -- translated into
question marks, or deleted altogether -- by many machines which use
IBM's EBCDIC character set.  This is why lots of people have been
objecting to UUENCODE and asking for some other program to be used in
its place.  XXENCODE uses a different set of printable characters that
aren't disturbed by any known ASCII/EBCDIC translation schemes in use
on USENET.

-- Rich Wales <wales@CS.UCLA.EDU> // UCLA Computer Science Department
   3531 Boelter Hall // Los Angeles, CA 90024-1596 // +1 (213) 825-5683
   "Then they hurl heavy objects. . . .  And claw at you. . . ."