Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!cs.utexas.edu!uunet!mcvax!hp4nl!phigate!prle!prles2!cstw01!meulenbr From: meulenbr@cstw01.prl.philips.nl (Frans Meulenbroeks) Newsgroups: comp.os.minix Subject: Re: compression Message-ID: <576@prles2.UUCP> Date: 19 Jul 89 13:00:52 GMT References: <2888@ast.cs.vu.nl> <1989Jul18.174647.19537@utzoo.uucp> <2908@ast.cs.vu.nl> Sender: nobody@prles2.UUCP Reply-To: meulenbr@cstw01.prl.philips.nl (Frans Meulenbroeks) Organization: Centre for Software Technology, Philips Eindhoven Lines: 29 Mmm. I don't think such a compression scheme would help very much. If you really want to do something like that, the best way seems to be to tokenize your source and apply Huffman coding on it. I did a small test using compress.c src: 38550 bytes 16 bit compress: 18189 bytes 13 bit compress: 18590 bytes 12 bit compress: 20776 bytes arc (crunched): 20754 bytes arc (squashed): 18618 bytes I don't have the zoo compressor so I can't give figures on that one. I think the main problem with the compression scheme that you suggest is that there is a lot of comment in the code which is unprocessed in your compression scheme. By the way: proper punctuation helps a little. After applying cb on compress.c the 16 bit compress was only 18030 bytes. I think that indent would even format a little better but badly enough my indent had some troubles with compress.c note: all tests done on a SUN using the SUNOS (BSD) utilities. Frans Meulenbroeks (meulenbr@cst.prl.philips.nl) Centre for Software Technology ( or try: ...!mcvax!phigate!prle!cst!meulenbr)