Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!uakari.primate.wisc.edu!polyslo!sdsu!crash!pnet01!uzun From: uzun@pnet01.cts.com (Roger Uzun) Newsgroups: comp.sys.amiga.tech Subject: Re: huffman encoding Message-ID: <486@crash.cts.com> Date: 5 Oct 89 14:56:29 GMT Sender: root@crash.cts.com Organization: People-Net [pnet01], El Cajon CA Lines: 25 I had misunderstood huffman encoding then. I thought that regardless of the actual frequency of a token in a file, the bit pattern and bit length used to encode that token was dependent only on its relative frequency to other tokens in the file. Say file A has 10 unique tokens in it and the most probable token occurs 90% of the time. File B also has 10 unqie tokens in it and the most probable token occurs 30% of the time. It was my understanding of huffman that the bit patterns used to encode the most probable token in each of these cases would be the same. Apparently you are saying this is not true, perhaps I am describing a "subset" of huffman encoding that would use constant trees, and of course be less efficient. Basically in my scheme of things any file that has all 256 possible bytes represented in it (the file contains 256 unique tokens) would have the same bit patterns/bit lengths used to encode its tokens. The tokens associated with the bit patterns/bit lengths would vary depending on their relative probabilities in that one file, but the encoding bit patterns/bit lengths would remain the same between the 2 files. HMMM I thought huffman only considered the RELATIVE probabilities of tokens in a file, perhaps you are thinking of a more advanced algorithm based on huffman encoding. -Roger Uzun UUCP: {hplabs!hp-sdd ucsd nosc}!crash!pnet01!uzun ARPA: crash!pnet01!uzun@nosc.mil INET: uzun@pnet01.cts.com