Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmum!smvorkoetter From: smvorkoetter@watmum.UUCP Newsgroups: comp.sys.ibm.pc,sci.crypt,comp.sources.wanted Subject: Data Compression Algorithms Message-ID: <924@watmum.UUCP> Date: Sun, 5-Apr-87 20:51:10 EST Article-I.D.: watmum.924 Posted: Sun Apr 5 20:51:10 1987 Date-Received: Sun, 5-Apr-87 23:48:01 EST Distribution: comp Organization: U of Waterloo, Ontario Lines: 44 Keywords: data compression algorithms text Xref: utgpu comp.sys.ibm.pc:2737 sci.crypt:287 comp.sources.wanted:849 In article <775@viper.UUCP> john@viper.UUCP (John Stanley) writes: >In article <383@newton.praxis.co.uk> mct%praxis.uucp@ukc.ac.uk writes: > >Once apon a time, about five years ago, a UK company called DJAI, (the > >authors of The Last One - a 4GL program generator) said that they had an > >algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. > >They even had a demonstration system which claimed to show the algorithm in > >operation. > >I expect the algorithm has been suppressed by the UK MoD :-) > > This is a facinating story... Anyone else ever hear about this? > Anyone know -anything- (or have any guesses) about the algorithm? This is not possible. Here is a proof. (I am using ^ as a symbol for exponentiation) Consider all possible strings of 80 bytes. There are 80 * 8 = 640 bits in such a string. Thus there are 2^640 such strings. Similarily there are 2^639 strings of 639 bits. So, there are 2^640 + 2^639 + 2^638 + ... + 1 strings of less than or equal to 80 bytes. This adds up to 2^641 such strings. No consider the set of all text files. Assume we limit ourselves to just the uppercase and lowercase alphabet, and the space character. So we have a character set of 53 characters. There are a total of 1 + 53 + 53^2 + 53^3 + ... 53^MAX_FILE_SIZE text files. Since the algorithm is reversible, each possible text file must have a unique representation when compressed. Therefore, there must be enough unique representations. Thus, 2^641 <= 1 + 53 + 53^2 + ... 53^MAX_FILE_SIZE which is clearly not possible unless MAX_FILE_SIZE is quite small (much less than 641 characters). Q.E.D. Stefan Vorkoetter University of Waterloo