Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-lcc!lll-crg!seismo!brl-adm!brl-smoke!gwyn
From: gwyn@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: net.crypt
Subject: Re: Code Breaking
Message-ID: <331@brl-smoke.ARPA>
Date: Sun, 27-Apr-86 22:38:39 EDT
Article-I.D.: brl-smok.331
Posted: Sun Apr 27 22:38:39 1986
Date-Received: Fri, 2-May-86 22:40:26 EDT
References: <113@radha.UUCP>
Reply-To: gwyn@brl.ARPA
Organization: Ballistic Research Lab (BRL)
Lines: 80

In article <113@radha.UUCP> sanand@radha.UUCP (Sanand Patel) writes:
>plaintext--> Some very important text that must be kept secret
>My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK
>
>Coded-------> Exclusive-Or of S+T, o+h, m+i etc.
>
>plaintext-->  Exclusive-Or of c1+T, c2+h , c3+i etc.
>on other side
>
>Now, not withstanding how the key is passed, is the above scheme breakable,
>especially with very large keys (say 100 or 200 letters) ?

How hard it is to break this depends on how much ciphertext is
available and how long the key is.  Step by step:

A simple form of Fourier analysis can fairly reliably determine the
period of the repeating key.  This allows us to "stack" statistics
for each cell of the cycle (this method is credited to Kasiski):

ThisIsMySecretKey	(key unknown except for length)

Some very importa	(actually, ciphertext is used here
nt text that must	 since the plaintext isn't yet known)
 be kept secret..

Each column gives a uniliteral frequency distribution.  Further,
if there is enough text, one may be able to determine that some
columns are encrypted under the same key (by cross-correlations,
known as a "chi" test after Friedman).  Such columns can be
merged into a super-stacked frequency distribution:
(For simplicity, I assume monocase text & key here.)

ThisMyecrK

Someer imr
nt tt hatu
 be ptsect
o. v.ap...
m.ex.t ...
e.ke..r...
...y..t...
...t..s...
... ......

If the statistics are good enough, you can assume the most frequent
letter in each column is "E", etc.  Or, assuming some simple method
such as cyclic (Caesar) addition or XOR was used to combine the key
with the plaintext, one may be able to check each column distribution
against the small number of possibilities (more correlation), and
thereby automatically determine the column keys.

In other words, this method is not very secure except for very short
messages per key.

Supposing one were to try to defeat Kasiski analysis by using a non-
repeating key?  There are two possibilities:  (1) random key, as in
the one-time pad; (2) nonrandom key, such as text from an agreed-upon
book.  In the latter case, one tries to simultaneously reconstruct
the key text and the plaintext; if you can guess a good crib (the
standard example is to assume the plaintext starts off "REFERENCE
YOUR MESSAGE ..."), then you can tell quickly that you got it right,
since the recovered key is readable.  It then is fairly easy to
simultaneously unravel the rest of the key and plaintext.

The second most hilarious encryption scheme I ever heard of used just
a short key to get started, then when the key ran out it used the
already-encrypted plaintext to provide the rest of the key, as in:

ThisIsMySecretKeySome very important text that mu.. (KEY)
Some very important text that must be kept secret.. (PLAINTEXT)

(The most hilarious scheme was similar, but instead used the
previous ciphertext as key for the remainder of the message!
Exercise for the student:  Why is this even funnier than the other?)

To be really secure without using a one-time key, you have to make
sure that the key is unintelligible AND that any structure it has
is insufficient to exploit by algebraic statistics.  For example,
a shift-register (CRC style) pseudo-random sequence has plenty of
internal structure and is unsafe for long message encryption.