Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site ecsvax.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!talcott!panda!genrad!decvax!mcnc!ecsvax!hes From: hes@ecsvax.UUCP (Henry Schaffer) Newsgroups: net.dcom,net.micro Subject: Squeezing program files. Message-ID: <1414@ecsvax.UUCP> Date: Wed, 5-Jun-85 14:16:58 EDT Article-I.D.: ecsvax.1414 Posted: Wed Jun 5 14:16:58 1985 Date-Received: Fri, 7-Jun-85 03:35:34 EDT Organization: NC State Univ. Lines: 35 Xref: watmath net.dcom:1025 net.micro:10686 Data Compression Using Static Huffman Code-Decode Tables (Communications of the ACM 28:612-616 June 1985) by D. R. Mcintyre and M. A. Pechura "Both static and dynamic Huffman coding techniques are applied to test data consisting of 530 source programs in four different languages. The results indicate that, for small files, a savings of 22-91 percent in compression can be achieved by using the static instead of dynamic techniques." -------- my summary ------ It is common to compress (squeeze) programs for storage and transmission economies. A common techniques is to count the symbol (character) frequencies in the program file, assign variable length codes, with more common symbols getting shorter codes (Huffman coding), encode the program, and append the encode table for subsequent decoding (unsqueezing). This is called dynamic Huffman coding. Static Huffman coding proceeds by starting with an encode table which has been previously determined for programs in general (or perhaps for that particular programming language.) Therefore, the table need not be appended to the program. This more than compensates on the average for the less perfect encoding used for each particular program - while for short programs there were major space savings from the static method. In addition, the static method is considerably faster, because the accumulation of frequencies and generation of the table is omitted. Source programs without trailing blanks compress to about 50% of original size, while source program files which include trailing blanks compress to about 30% of original size. --------- It is my impression that the typical squeeze programs use the dynamic Huffman method. Is this so? If it is, should we consider changing to the static method? --henry schaffer n c state univ