Newsgroups: comp.os.minix Path: utzoo!henry From: henry@utzoo.uucp (Henry Spencer) Subject: Re: compression Message-ID: <1989Jul18.174647.19537@utzoo.uucp> Organization: U of Toronto Zoology References: <2888@ast.cs.vu.nl> Date: Tue, 18 Jul 89 17:46:47 GMT In article <2888@ast.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes: >I wonder if better compression of C programs is possible... >sort of like libpack.c does, only dynamically instead of using fixed strings. >... It is my suspicion that such a program could compress >better than a factor of 2 on C programs. Andy, I just ran some quick tests using some C-analysis stuff I've got, and I doubt that a simple approach will give you more than a factor of 2-3. I ran a few large C programs through a tokenizer (one which retains white space), and counted both the number of tokens (approximating the number of output codewords, ignoring limits on codeword size) and the size of the output after "sort -u" (approximating the size of the codeword dictionary). This is actually an optimistic estimate because of the limits on codeword size and the fact that my tokenizer essentially eliminates comments. Best case was about a factor of 3. A quick look at eliminating all white space (i.e. we assume a C-specific compressor whose decompressor includes a paragrapher) suggests that this might perhaps get it to a factor of 4 in favorable cases. All in all, it doesn't seem a promising approach. -- $10 million equals 18 PM | Henry Spencer at U of Toronto Zoology (Pentagon-Minutes). -Tom Neff | uunet!attcan!utzoo!henry henry@zoo.toronto.edu