Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!ll-xn!mit-eddie!uw-beaver!cornell!rochester!PT.CS.CMU.EDU!NL.CS.CMU.EDU!mlm From: mlm@NL.CS.CMU.EDU (Michael Mauldin) Newsgroups: sci.crypt Subject: Re: turning a plaintext source into a good "random" key Message-ID: <1009@PT.CS.CMU.EDU> Date: 1 Mar 88 23:41:02 GMT References: <8803011848.AA05787@decwrl.dec.com> Sender: netnews@PT.CS.CMU.EDU Organization: Carnegie-Mellon University, CS/RI Lines: 70 In article <8803011848.AA05787@decwrl.dec.com>, kruger@16bits.dec.com writes: > The trick is to correctly extract the random element, while somehow > 'chopping up' the regularities. For something more demonstrably "random" why not use your favorite redundancy reducer (read: file compressor) to destroy the pattern? Huffman code based on bigrams, or use Lempel-Ziv compress. The better the compress routine, the less redundancy there will be in key stream. You may still have to deal with some regularities in the resulting bits. Your suggestion of ignoring vowels and using the low order bits doesn't seem to work well at all (did you test it?). Here are histograms for the text of the US Constitution (50,090 characters): Character counts (ignoring punctuation and digits < 100 occurences): SP 10397 **************************************************************** , 642 *** . 377 ** a 2779 ***************** b 659 **** c 1276 ******* d 1307 ******** e 5385 ********************************* f 1077 ****** g 454 ** h 2041 ************ i 2607 **************** j 100 k 46 l 1496 ********* m 792 **** n 2708 **************** o 2856 ***************** p 834 ***** q 52 r 2325 ************** s 2831 ***************** t 3913 ************************ u 883 ***** v 481 ** w 385 ** x 129 y 553 *** z 31 Low 4 bits of all letters except vowels: 0000 11248 **************************************************************** 0001 2943 **************** 0010 3037 ***************** 0011 4149 *********************** 0100 5241 ***************************** 0101 6287 *********************************** 0110 1584 ********* 0111 862 **** 1000 2198 ************ 1001 3200 ****************** 1010 160 1011 128 1100 2166 ************ 1101 918 ***** 1110 3113 ***************** 1111 2856 **************** So you see there is still quite a lot of structure left. Michael L. Mauldin (Fuzzy) Department of Computer Science ARPA: Michael.Mauldin@NL.CS.CMU.EDU Carnegie Mellon University Phone: (412) 268-3065 Pittsburgh, PA 15213-3890