Xref: utzoo sci.crypt:2780 comp.lang.c:26894 Path: utzoo!attcan!uunet!samsung!zaphod.mps.ohio-state.edu!uwm.edu!ogicse!decwrl!shlump.nac.dec.com!mountn.dec.com!cadsys.enet.dec.com!cooper From: cooper@cadsys.enet.dec.com Newsgroups: sci.crypt,comp.lang.c Subject: Re: New(?) encryption algorithm Message-ID: <1449@mountn.dec.com> Date: 14 Mar 90 15:09:34 GMT References: <102375@linus.UUCP> <1877@bruce.OZ> <100295@linus.UUCP> <1990Mar9.233612.22226@xanadu.com> Sender: news@mountn.dec.com Reply-To: cooper@cadsys.enet.dec.com () Followup-To: sci.crypt Organization: Digital Equipment Corporation Lines: 31 This is in regards to whether to count discrete or overlapping digrams. I'm surprised that no one seems to have pointed out the "answer" to this, since it is part of standard statistics. It depends on the nature of the tests which you are performing on them. Overlapping digrams give you twice as much data, but the frequencies obtained are not independent. For example, it is not valid to use a chi-square test on overlapping digrams to determine if the digrams deviate significantly from chance distribution, since the chi-square test assumes that the frequencies are independent. On the other hand it is perfectly valid to estimate the probability that an X will be followed by a Y on the basis of overlapping digrams. In practice, except for a few hard-and-fast yes/no automatic procedures, in cryptographic work the increase in information from using overlapping digrams (and trigrams, etc.) will generally overshadow the slight loss in pure validity, since the information collected is likely to be leavened with a large dose of intuitive, trial-and-error adjustment by the cryptoanalyist in later phases of the work. Topher Cooper