Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!sri-unix!hplabs!hpcea!hpfcdc!hpldola!hpldolm!ben From: ben@hpldolm.UUCP Newsgroups: comp.ai Subject: Re: analysis of unknown data Message-ID: <11160001@hpldolm.HP.COM> Date: Wed, 18-Mar-87 15:48:17 EST Article-I.D.: hpldolm.11160001 Posted: Wed Mar 18 15:48:17 1987 Date-Received: Sun, 22-Mar-87 21:20:03 EST References: <5681@mimsy.UUCP> Organization: HP Logic Design Oper. - ColoSpgs, CO Lines: 65 I have two comments on this discussion; the first is general the second is specific. My first comment on this whole discussion, as I understand it, is that it is silly. We are being asked to find "the" meaning of some large file without any context for the file. Is it text? Is it integer data? Is it floating point data? Is it encrypted in any way? The search for meaning in the absence of context is a waste of time. (In essence, I agree with M. B. Brilliant as follows.) What is meaningful in one context is often not meaningful in another. However, sometimes, it is. A file full of integer measurement data will usually be indistinguishable from a file of a bit-mapped color image. A bunch of integers is a bunch of integers (unless some *recognizable* context information is included). If you take a group of integers and make a pretty picture with them, what will you do when I tell you that they were process measurements from a ball-bearing factory? What will you do when you interpret a Mandelbrot image as a bad lot of wafers in an otherwise well controlled fab? I'm sure that you would like to say that you can't make a pretty picture with ball bearing data. Perhaps not in every case, but I know of a gentleman who *sells* "art" generated from HP stock performance data. He has given some stock data meaning in a new context. The best response to this question was the one from Mr. Adrian who suggested that you look for the context(s) that the file was used in. If you can't find the correct context, you cannot ascertain the correct meaning. If the data exists in a vacuum, you can choose whatever context that you wish and with enough massaging you can make the data meaningful. Second comment: > Testing for randomness might be the first test; sure would save Random is too loose of a term. Are they "random" samples from a uniform distribution, or "random" samples from a Gaussian distribution? In either case is the distribution a real population, or a mathematical model of a distribution function? I don't want to sound like a flame, but testing for randomness is ridiculous! You *cannot* prove a set of data to be "random." In fact the key to some encryption schemes is to make a dataset appear "random" to most simple minded tests. This does not mean that there is no information in the data. It just means that the context of the information is well hidden from such simple minded filters. What you are saying when you say that you will test for randomness is that you will test to see if the data is meaningful in any known context. Do you know all possible contexts? Will you live long enough to test for all of them? What happens when the data is meaningful in more than one context? --------- Benjamin Ellsworth hplabs!hpldola!ben (303) 590-5849 P.O. Box 617 Colorado Springs, CO 80901 2+2=4 (void where prohibited, regulated, or otherwise restricted by law)