Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ginosko!rex!ukma!gatech!hubcap!ncrcae!ncr-sd!hp-sdd!ucsdhub!sdcsvax!beowulf!demers From: demers@beowulf.ucsd.edu (David E Demers) Newsgroups: comp.ai.neural-nets Subject: Re: Data Complexity Message-ID: <7261@sdcsvax.UCSD.Edu> Date: 18 Oct 89 20:37:44 GMT References: <4542@imagen.UUCP> Sender: nobody@sdcsvax.UCSD.Edu Reply-To: demers@beowulf.UCSD.EDU (David E Demers) Organization: EE/CS Dept. U.C. San Diego Lines: 71 In article ted@nmsu.edu (Ted Dunning) writes: >ivan bach is very enthusiastic, but relatively uninformed about recent >developments in determining the complexity of finite sequences. Agreed. >he also has an annoying tendency to post news that is more than 80 >columns. Ditto. >here is a partial commentary on his posting, >In article <4542@imagen.UUCP> ib@apolling (Ivan N. Bach) writes: [discussion of Bach's posting + critical commentary deleted] >---------------------------------------------------------------- >shannon's formulae are simple enough to be misused extensively by all >kinds of people. such misuse does not mean that they are any more >applicable than they were in 1949. Despite the misuse, the basic concept is very valuable. >a much more sound approach to the complexity of sequences was proposed >in the 60's by various soviet mathematicians, and was popularized for >use with chaotic systems by v.i. arnold in the 70's. this model >defines the complexity of a finite string to be the size of the >smallest computer program which can be used to recreate the sequence. >in arnold's work, the sequence was the itinerary of a finite map, but >this does not really matter if we are starting with a sequence from a >finite alphabet. This work goes by the name of "Kolmogorov Complexity", usually. Solomonoff, Kolmogorov, Chaitin did much of the pioneering work in the early 60's, from independent reasons - examining the notion of randomness [Kolmogorov] and inference [Solomonoff]. >obviously, the upper bound on the complexity of a finite sequence is >the sequence size itself, since we can easily write a program for, say, >a ram machine which builds a list containing the sequence in the same >number of steps as the original sequence is long. > >the complexity of one of our 100 character substrings of {abcd}* is >much less, since we can write a very simple program which produces >one of these substrings. > >in practice, information content of a complex sequence can be >estimated by building relatively simple models for inter-symbol >dependencies and then measuring the entropy of the errors the model >makes in predicting the next symbol. this is nearly equivalent to the >approach used by arnold and co, since if you can't build a predictor >at all, then the error stream becomes equivalent to the original >sequence. The above is a very good quick description of the concept. It essentially quantifies Occam's Razor in modeling. Computational learning theory takes much of these ideas; Valiant, et al make use of the same principle. I apologize for once again posting with the 'F' key before gathering citations. Walt Savitch just gave a talk last week on Kolmogorov Complexity for the Cognitive Science seminar here at UCSD; I will scare up a summary and some references and post soon for those interested in pursuing the idea in more depth. Dave DeMers demers@cs.ucsd.edu Dept. of Computer Science & Engineering demers@inls1.ucsd.edu UCSD {other non-Internet mailpaths La Jolla, CA 92093 according to normal syntax...}