Xref: utzoo sci.crypt:2766 comp.lang.c:26757
Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!apple!xanadu!michael
From: michael@xanadu.com (Michael McClary)
Newsgroups: sci.crypt,comp.lang.c
Subject: Re: New(?) encryption algorithm
Message-ID: <1990Mar9.233612.22226@xanadu.com>
Date: 9 Mar 90 23:36:12 GMT
References: <1877@bruce.OZ> <100295@linus.UUCP>
Organization: Xanadu Operating Company, Palo Alto, CA
Lines: 18

In article <100295@linus.UUCP> hal@gateway.mitre.org (Hal Feinstein) asks
the "correct" way to count bigrams:
 - Each odd character, followed by its successor.
 - Each character, followed by its successor.

If what you're looking for is bigram frequency, to check for caeser
cypers and the like, the second is the way to go.  Consider that the
first approach will miss every instance of "th" that starts on an
even-numbered position.

In cyphers where each character is treated separately and similarly,
you can expect the two methods to produce similar results, but the
first to require twice as much cyphertext to give you an equivalent
amount of information.

If the cypher is chunking the data (with an even number of characters
per chunk), or otherwise treating odd and even cyphertext characters
differently, the first method might give you more useful information.