Path: utzoo!utgpu!water!watmath!clyde!burl!codas!killer!ninja!sys1!sneaky!gordon From: gordon@sneaky.UUCP Newsgroups: sci.crypt Subject: Re: how do you tell encrytped data from Message-ID: <-63293657@sneaky> Date: 10 Jan 88 07:24:00 GMT References: <660@bucket.UUCP> Lines: 75 Nf-ID: #R:bucket.UUCP:660:sneaky:-63293657:000:4348 Nf-From: sneaky.UUCP!gordon Jan 10 01:24:00 1988 > An interesting question has crossed my mind. If someone presents you with > an allegedly encrypted message, How can you tell if it really is encrypted > as opposed to being a bunch of random characters? You can't. And if I wanted to really screw things up, and had plenty of transmission bandwidth, I could use an encoding method that was mostly, but not completely noise. For example, each netnews article I post could have encoded in it one bit, which is either 1 or 0 depending on whether the number of lines in it plus the number of newsgroups it is cross-posted to is odd or even. I have to post about 200 netnews articles to transmit one sentence. I defy anyone to prove that there is really a message! For a less blatant waste of transmission bandwidth, how about using the "parity" bits only to contain the message, and the lower 7 bits contain random quotes from "talk.abortion" and "talk.bizarre"? > I know that transposition and *simple* substitution can be detected by > letter frequency analysis. But is a "flat" distibution evidence of random > data? No. Among other things, crypt(1) output gives a fairly flat distribution in its 256-character output alphabet. (the version of crypt I am thinking of is based on a WW-II rotor machine, and uses DES only to scramble the user's typed key. (You don't need to crack the DES to crack the encryption). This version was in System III, among others). In fact, it gives an absolutely flat distribution for any message consisting of one character repeated a multiple of 256 times and any key. (If not a multiple of 256, the distribution is as flat as you can get with that message length). You also get a flat distribution, more or less, by encoding techniques as "encode the i th character of the message by rotating it i positions in the alphabet", assuming the message was reasonably long. This doesn't even have an encryption key. A "real" encryption might use a keyword J characters long, and encode the i th character of the message by rotating it by its position in the message (i) plus an amount determined from the (i mod J)th character of the key. For messages that are long relative to the key length and the alphabet size, this will give a fairly flat distribution. You will be able to detect distribution patterns by taking every (J * alphabet size)th character, and looking at the frequency distribution for that. > For my purposes, both "one-time pad" ciphers and anything that operates on > units other than characters can be considered random! If it is that complex, > then I'm not likely to crack it! I'm not quite sure how to interpret this. Would this mean that running the message through uuencode(1) (which maps every 3 characters of 8 bits each into 4 characters out of a 6-bit 64-character alphabet) is too complex? It doesn't even have an encryption key! Running stuff through uuencode does tend to flatten out the distribution somewhat, but a message consisting mostly of the letter x will still have several high-frequency characters. To REALLY flatten out the distribution, there are several popular UNIX and DOS utilities to do this, under the classification "data compression programs". These include pack, compress, ARC, and compact. If you don't like the use of all 8 bits in the output, run the result through a filter to transform it to a smaller alphabet. These include things like uuencode/uudecode, btoa/atob (distributed with compress 4.0), encode/decode (distributed with B news 2.11), and, if you're really desparate, od, the octal dump program. (A few years back, someone posted a program to take an octal dump and re-create the file it was a dump of). These programs do not encrypt (that is, there is no encryption key, and just knowing which programs are used and in what order is sufficient to recover the plaintext. To get reasonable security, transform the text in several stages: plaintext -> compress -> your favorite encryption -> uuencode -> ciphertext Compress first, then encrypt. You save on message transmission costs and you mess up some clues used to attack the encryption - such as "all characters going into the encryption probably have the high-order bit turned off", because the compression removed much of the redundancy. Gordon Burditt ...!ihnp4!sys1!sneaky!gordon