Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!polaris!josh From: josh@polaris.UUCP (Josh Knight) Newsgroups: net.crypt Subject: Re: Censorship on the net Message-ID: <527@polaris.UUCP> Date: Wed, 14-May-86 18:33:30 EDT Article-I.D.: polaris.527 Posted: Wed May 14 18:33:30 1986 Date-Received: Fri, 23-May-86 21:53:58 EDT References: <3660@sun.uucp> <271@atari.UUcp> Reply-To: josh@polaris.UUCP (Josh Knight) Distribution: net Organization: IBM Research, Yorktown Heights, N.Y. Lines: 57 Summary: It's easy to detect cipher text In article <271@atari.UUcp> dyer@atari.UUcp (Landon Dyer) writes: > >How do you distinguish --- and clobber --- encrypted messages if you >don't want them going through? > It should be very easy to tell encrypted text from plain text. The distribution of characters will be very different. Just for example consider the table below: Plain Encrypted 17095 555 11000 554 7655 553 7329 549 6985 547 6844 541 6272 539 6267 536 5800 535 3973 532 This is the distribution of the number of characters, sorted by the frequency. I.e. the first number in left column is the number of occurrences of the most frequent character in the plain text (blank) and the first number in the right column is the number of occurrences of the most frequent character in the encrypted text (some unprintable thing). The document was input for a text formatter (so period is higher than you might think) but the point is that the encrypted text LOOKS random (has high entropy) while the plain text does not. This particular document has about 126 K characters, so the average number of occurrences per character (the original character set was EBCDIC, so 8 bits are mandatory) is about 500. The minimum number of occurrences for the encrypted text is 439, while for the plain text only the top ranked 88 characters had non-zero counts, the rest being 0. The distribution of character pairs is even more striking. Again, the encrypted text has an almost "flat" distribution. The most frequent pair occurred only 10 times. The encrypted text has MANY more different pairs, about 56 K, than the plain text where there were about 2 K different pairs with the most frequent pair ('e ') occurring 3 K times. Note that detecting and clobbering news items this way will also remove items with totally random content. This would affect some news groups, but the effect might be considered beneficial in any event ;-). The encryption was done by an IBM product, which for the purposes of this discussion uses plain DES. This of course does not change the fact that any opinions expressed or implied are mine and not my employers. -- Josh Knight, IBM T.J. Watson Research josh@ibm.com, josh@yktvmh.bitnet, ...!philabs!polaris!josh