Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!mcdphx!riscokid.UUCP!fnf
From: fnf@riscokid.UUCP (Fred Fish)
Newsgroups: comp.sys.amiga
Subject: Re: When and when not to use BRU for backups
Message-ID: <13469@mcdphx.phx.mcd.mot.com>
Date: 21 Aug 90 18:34:00 GMT
References: <03880.AA03880@sosaria.imp.com>
Sender: listen@mcdphx.phx.mcd.mot.com
Reply-To: fnf@riscokid.UUCP (Fred Fish)
Organization: Motorola Microcomputer Division, Tempe, Az.
Lines: 49

In article <03880.AA03880@sosaria.imp.com> wizard@sosaria.imp.com (Chris Brand) writes:
>On my harddisk, every binary file bigger than 10K is crunched, giving me
>about twice as much room for binary files than without crunching anything.
>If you compress crunched files, inclulding archives made with zip or lharc,
>they get bigger than before. Example: I have a lzh file with 850K. If I
>compress it using BRU's Huffman encoding with 16 bits, it takes me
>afterwards over 1000K.

BRU does not use Huffman, it uses the standard LZW compression algorithm,
defaulting to 12 bits, but able to use up to 16 bits if you request it with
the -N flag.  You are correct in that most compressed files will get
bigger when compressed again.

>BRU doesn't check if the compressed file is larger
>than the original and stores only the compressed version. With this
>unlogical behaviour of BRU, I gain nearly no disks compared to Quarterback,

This is incorrect.  BRU checks that the final compressed version is no
larger than the original, and saves either the compressed version or the
original as appropriate.  When run with -vvvv it will report which files
are saved uncompressed.

The "growth factor" you are seeing is the BRU archive overhead.  A summary
of this overhead is:

	1)	12.5% for the block headers on each 2K archive block.
		The block header is 256 bytes, and contains lots of
		information bru uses to keep track of each archive
		block.

	2)	Each archived file requires one 2k file header block
		and then an arbitrary number of data blocks, each 2k.
		Thus a zero length file consumes 2k of archive space.
		A 1 byte file consumes 4k (2k header + 1 data block).

	3)	The archive has a 2k archive header block and a 2k archive
		trailer block.

Thus approximately 1000K of archive space to store a 850K file sounds
about right.  The overhead takes an even bigger bite for groups of
small files, because of the file header block.

The archive format was designed for robustness and expandability, with
compactness only a secondary concern.  In retrospect, it would probably
have been better to pay more attention to space considerations and make
the archive format a pure stream format with blocking done by the I/O
routines.  Maybe in BRU II ...

-Fred