Xref: utzoo sci.crypt:2773 comp.lang.c:26859
Path: utzoo!attcan!uunet!aplcen!samsung!think!linus!gateway.mitre.org!hal
From: hal@gateway.mitre.org (Hal Feinstein)
Newsgroups: sci.crypt,comp.lang.c
Subject: Re: New(?) encryption algorithm
Message-ID: <102375@linus.UUCP>
Date: 13 Mar 90 15:34:53 GMT
References: <1877@bruce.OZ> <100295@linus.UUCP> <1990Mar9.233612.22226@xanadu.com>
Sender: news@linus.UUCP
Reply-To: hal@gateway.mitre.org (Hal Feinstein)
Organization: The Mitre Corporation
Lines: 39

I'd like to thank all those who responded with the "correct" way to
count bigrams. Interestingly, the standard reference work
on this subject, Kullback's Statistical Methods in Cryptanalysis 
is silent on the subject. Here's a summary of what people said
on the subject:

The debate in our telecommunications security class was over
how to do the 2-gram correlation technique for solving 
monoalphabet ciphers suggested in Konheim p. 79 (Cryptography,
A Primer). At that point we had not considered  more complex
ciphers  but got stuck (for a while) trying to figure this out.
At least one reply said it was obvious. Well, some of us are
still learning.  I can't say it obvious right now.   

To paraphrase what most people said.  (1)If the symbols in the
underlying alphabet are single letters (simple transposition for
example) then sliding one character at a time is OK.  
(2) If the symbols in the underlying
alphabet are blocks of characters, for example two characters 
per symbol, then slide by the block length.  This makes sense
since forming bigrams by sliding one character a time across
a bigram substitution cipher fractures the bigrams, 
probabily wrecking the accuracy of the tally.

The problem is what to do when confronted with a unknown
cryptogram?  How should you do the tally?  One reply stated that
if a null had been inserted as the first character and you counted
on even boundaries, you would get junk. So best to take both types
of counts:  One starting on the odd boundary and the other on an 
even boundary. 

Another said: just discover how your bigram table was counted and
make sure you count in the same way. Hmmm...

One respondent advised using Markov analysis to discern hidden 
processes as a way to discover the underlying cipher. OK, seems 
reasonable.  Another reply suggested that the ciphertext is really
generated by a non-stationary statistical source which  explains why
decimated bigram averages would differ.