Xref: utzoo comp.binaries.ibm.pc.d:248 comp.sources.d:2117 Path: utzoo!attcan!uunet!husc6!bloom-beacon!tut.cis.ohio-state.edu!mandrill!hal!ncoast!allbery From: allbery@ncoast.UUCP (Brandon S. Allbery) Newsgroups: comp.binaries.ibm.pc.d,comp.sources.d Subject: Re: Standard for file transmission Message-ID: <7781@ncoast.UUCP> Date: 15 May 88 21:10:03 GMT References: <299@cullsj.UUCP> <2096@epimass.EPI.COM> <563@csccat.UUCP> Reply-To: allbery@ncoast.UUCP (Brandon S. Allbery) Followup-To: comp.binaries.ibm.pc.d Organization: Cleveland Public Access UN*X, Cleveland, Oh Lines: 47 As quoted from <563@csccat.UUCP> by loci@csccat.UUCP (Chuck Brunow): +--------------- | In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: | >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: | >> I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and | >>the authors do not mention its use for binaries, i never considered using it. | >>I tried it on an executable under UNIX and obtained a good reduction, for | >>reasons which are not apparent. I'm sure that there are cases where this does | | This is actually partially true. The first "compress" to appear | on the net (several years ago) only worked on text files and | dumped core on binary files. The reason you get good compressions | on binary files is probably that they haven't been stripped of | the relocation info. Strip them first and I doubt that the | compression will be so good (otherwise, throw your optimizer | into the bit bucket). Typical (large) text compression is about | 67%, whereas binaries are closer to 20%. (I use 16-bit compress). +--------------- Wrong. Consider that, for example, every call to putchar() contains some fixed code (such as a call to _flsbuf()); this, on a 32-bit address space machine, will always be the same byte sequence (on a 680x0, it's 6 bytes). Other things will also be common: printf("format", non-double-value); (which is by far the *most* common use of printf(), from what I've seen; perhaps others have seen other more common calls) has the constant assembler code on a 680x0: jsr _printf 6 bytes addql #8,a6 2 bytes (and "printf("constant")", also common, is a slightly different 8-byte value). These kind of extremely common operations can't be optimized out and are quite amenable to compression. RISC eecutables are likely to be even more amenable to compression, since many operations will assemble into lengthy byte sequences -- many of which will be partially or totally identical. Ergo: compression of executables generally works pretty well. (I regularly see 50%-60% on stripped, optimized executables on ncoast.) -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery Delphi: ALLBERY MCI Mail: BALLBERY