Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!wuarchive!mit-eddie!rutgers!iuvax!maytag!looking!brad
From: brad@looking.on.ca (Brad Templeton)
Newsgroups: comp.binaries.ibm.pc.d
Subject: Re: Stoopid *nix freaks!
Message-ID: <100352@looking.on.ca>
Date: 22 Feb 90 06:16:18 GMT
References: <90052.182144CMH117@psuvm.psu.edu>
Organization: Looking Glass Software Ltd.
Lines: 29
Class: discussion

Sadly enough there are only 84 usable printable characters that can safely
survive and EBCDIC translation.  That's a shame, because
84^5 is just slightly less than 2^32, and 85^5 is bigger, so
with 85 we could make a map 4 bytes to 5 printables encoder that only
had a 20%+ expansion, instead of the 36%+ expansion of xxencode and uuencode.

ABE's expansion in the ABE2 format (safe for EBCDIC) varies based on the
file.  Unfortunately it was not designed for packed data, but you get
figures like this (including headers -- above figures did not do this):

	Text file		/etc/termcap		5.8%
	Unix Binary		/unix			17.4%
	DOS Binary		kermit.exe		19.4%
	ARC File		uupc.arc		37.8% 
	ARC File (ABE UUE format) uupc.arc		38%
	Compressed file		rfc822.Z		40%

The last one is pretty much worst case.  In such cases you can go to UUENCODE
format which is pretty constant at 36% plus headers.  The above were done
without line numbers.  The line numbers are a redundancy.  Using them, you
can take a scrambled ABE file, put it through 'sort' and get the data back
in order.   This is not normally needed, as DABE can already handle multi-block
files with the blocks in the wrong order (or with duplicates) but if the
user has a tiny dabe decoder, sort can do the trick to handle blocks in
random order.  (Not duplicates)   ABE1 format does slightly better
(16.5% on kermit.exe vs. 19.4 for ABE2 and 38% for UUENCODE) but it uses
all printable characters, including the evil ones that die over BITNET.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473