Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!mcdphx!riscokid.UUCP!fnf From: fnf@riscokid.UUCP (Fred Fish) Newsgroups: comp.sys.amiga Subject: Re: When and when not to use BRU for backups Message-ID: <13469@mcdphx.phx.mcd.mot.com> Date: 21 Aug 90 18:34:00 GMT References: <03880.AA03880@sosaria.imp.com> Sender: listen@mcdphx.phx.mcd.mot.com Reply-To: fnf@riscokid.UUCP (Fred Fish) Organization: Motorola Microcomputer Division, Tempe, Az. Lines: 49 In article <03880.AA03880@sosaria.imp.com> wizard@sosaria.imp.com (Chris Brand) writes: >On my harddisk, every binary file bigger than 10K is crunched, giving me >about twice as much room for binary files than without crunching anything. >If you compress crunched files, inclulding archives made with zip or lharc, >they get bigger than before. Example: I have a lzh file with 850K. If I >compress it using BRU's Huffman encoding with 16 bits, it takes me >afterwards over 1000K. BRU does not use Huffman, it uses the standard LZW compression algorithm, defaulting to 12 bits, but able to use up to 16 bits if you request it with the -N flag. You are correct in that most compressed files will get bigger when compressed again. >BRU doesn't check if the compressed file is larger >than the original and stores only the compressed version. With this >unlogical behaviour of BRU, I gain nearly no disks compared to Quarterback, This is incorrect. BRU checks that the final compressed version is no larger than the original, and saves either the compressed version or the original as appropriate. When run with -vvvv it will report which files are saved uncompressed. The "growth factor" you are seeing is the BRU archive overhead. A summary of this overhead is: 1) 12.5% for the block headers on each 2K archive block. The block header is 256 bytes, and contains lots of information bru uses to keep track of each archive block. 2) Each archived file requires one 2k file header block and then an arbitrary number of data blocks, each 2k. Thus a zero length file consumes 2k of archive space. A 1 byte file consumes 4k (2k header + 1 data block). 3) The archive has a 2k archive header block and a 2k archive trailer block. Thus approximately 1000K of archive space to store a 850K file sounds about right. The overhead takes an even bigger bite for groups of small files, because of the file header block. The archive format was designed for robustness and expandability, with compactness only a secondary concern. In retrospect, it would probably have been better to pay more attention to space considerations and make the archive format a pure stream format with blocking done by the I/O routines. Maybe in BRU II ... -Fred