Path: utzoo!utgpu!water!watmath!clyde!burl!codas!killer!ninja!sys1!sneaky!gordon
From: gordon@sneaky.UUCP
Newsgroups: sci.crypt
Subject: Re: how do you tell encrytped data from
Message-ID: <-63293657@sneaky>
Date: 10 Jan 88 07:24:00 GMT
References: <660@bucket.UUCP>
Lines: 75
Nf-ID: #R:bucket.UUCP:660:sneaky:-63293657:000:4348
Nf-From: sneaky.UUCP!gordon    Jan 10 01:24:00 1988


> An interesting question has crossed my mind. If someone presents you with
> an allegedly encrypted message, How can you tell if it really is encrypted
> as opposed to being a bunch of random characters?

You can't.  And if I wanted to really screw things up, and had plenty of
transmission bandwidth, I could use an encoding method that was mostly,
but not completely noise.  For example, each netnews article I post could 
have encoded in it one bit, which is either 1 or 0 depending on whether the
number of lines in it plus the number of newsgroups it is cross-posted to
is odd or even.  I have to post about 200 netnews articles to transmit one 
sentence.  I defy anyone to prove that there is really a message!
For a less blatant waste of transmission bandwidth, how about using the
"parity" bits only to contain the message, and the lower 7 bits contain
random quotes from "talk.abortion" and "talk.bizarre"?

> I know that transposition and *simple* substitution can be detected by
> letter frequency analysis. But is a "flat" distibution evidence of random
> data?

No.  Among other things, crypt(1) output gives a fairly flat distribution
in its 256-character output alphabet.  (the version of crypt I am thinking of 
is based on a WW-II rotor machine, and uses DES only to scramble the user's 
typed key.  (You don't need to crack the DES to crack the encryption).  This 
version was in System III, among others).  In fact, it gives an absolutely 
flat distribution for any message consisting of one character repeated a 
multiple of 256 times and any key.  (If not a multiple of 256, the 
distribution is as flat as you can get with that message length).

You also get a flat distribution, more or less, by encoding techniques 
as "encode the i th character of the message by rotating it i positions in 
the alphabet", assuming the message was reasonably long.  This doesn't even 
have an encryption key.  A "real" encryption might use a keyword J characters
long, and encode the i th character of the message by rotating it by its
position in the message (i) plus an amount determined from the (i mod J)th 
character of the key.  For messages that are long relative to the key length 
and the alphabet size, this will give a fairly flat distribution.  You will
be able to detect distribution patterns by taking every (J * alphabet size)th
character, and looking at the frequency distribution for that.  

> For my purposes, both "one-time pad" ciphers and anything that operates on
> units other than characters can be considered random! If it is that complex,
> then I'm not likely to crack it!

I'm not quite sure how to interpret this.  Would this mean that running the
message through uuencode(1) (which maps every 3 characters of 8 bits each
into 4 characters out of a 6-bit 64-character alphabet) is too complex?  It 
doesn't even have an encryption key!  Running stuff through uuencode does tend 
to flatten out the distribution somewhat, but a message consisting mostly of
the letter x will still have several high-frequency characters.

To REALLY flatten out the distribution, there are several popular UNIX and
DOS utilities to do this, under the classification "data compression programs".
These include pack, compress, ARC, and compact.  If you don't like the
use of all 8 bits in the output, run the result through a filter to transform
it to a smaller alphabet.  These include things like uuencode/uudecode, 
btoa/atob (distributed with compress 4.0), encode/decode (distributed with B 
news 2.11), and, if you're really desparate, od, the octal dump program.
(A few years back, someone posted a program to take an octal dump and 
re-create the file it was a dump of).  

These programs do not encrypt (that is, there is no encryption key, and just
knowing which programs are used and in what order is sufficient to recover
the plaintext.  To get reasonable security, transform the text in several
stages:

plaintext -> compress -> your favorite encryption -> uuencode -> ciphertext

Compress first, then encrypt.  You save on message transmission costs and
you mess up some clues used to attack the encryption - such as "all characters
going into the encryption probably have the high-order bit turned off", because
the compression removed much of the redundancy.

					Gordon Burditt
					...!ihnp4!sys1!sneaky!gordon