Xref: utzoo sci.crypt:2773 comp.lang.c:26859 Path: utzoo!attcan!uunet!aplcen!samsung!think!linus!gateway.mitre.org!hal From: hal@gateway.mitre.org (Hal Feinstein) Newsgroups: sci.crypt,comp.lang.c Subject: Re: New(?) encryption algorithm Message-ID: <102375@linus.UUCP> Date: 13 Mar 90 15:34:53 GMT References: <1877@bruce.OZ> <100295@linus.UUCP> <1990Mar9.233612.22226@xanadu.com> Sender: news@linus.UUCP Reply-To: hal@gateway.mitre.org (Hal Feinstein) Organization: The Mitre Corporation Lines: 39 I'd like to thank all those who responded with the "correct" way to count bigrams. Interestingly, the standard reference work on this subject, Kullback's Statistical Methods in Cryptanalysis is silent on the subject. Here's a summary of what people said on the subject: The debate in our telecommunications security class was over how to do the 2-gram correlation technique for solving monoalphabet ciphers suggested in Konheim p. 79 (Cryptography, A Primer). At that point we had not considered more complex ciphers but got stuck (for a while) trying to figure this out. At least one reply said it was obvious. Well, some of us are still learning. I can't say it obvious right now. To paraphrase what most people said. (1)If the symbols in the underlying alphabet are single letters (simple transposition for example) then sliding one character at a time is OK. (2) If the symbols in the underlying alphabet are blocks of characters, for example two characters per symbol, then slide by the block length. This makes sense since forming bigrams by sliding one character a time across a bigram substitution cipher fractures the bigrams, probabily wrecking the accuracy of the tally. The problem is what to do when confronted with a unknown cryptogram? How should you do the tally? One reply stated that if a null had been inserted as the first character and you counted on even boundaries, you would get junk. So best to take both types of counts: One starting on the odd boundary and the other on an even boundary. Another said: just discover how your bigram table was counted and make sure you count in the same way. Hmmm... One respondent advised using Markov analysis to discern hidden processes as a way to discover the underlying cipher. OK, seems reasonable. Another reply suggested that the ciphertext is really generated by a non-stationary statistical source which explains why decimated bigram averages would differ.