Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!sun-barr!newstop!sun!imagen!daemon From: ib@apolling (Ivan N. Bach) Newsgroups: comp.ai.neural-nets Subject: Data Complexity Message-ID: <4522@imagen.UUCP> Date: 12 Oct 89 16:24:47 GMT Sender: daemon@imagen.UUCP Lines: 106 Many scientists believe that you cannot accept something as a science unless you can measure it in some way. Scientists often prove or disprove their theories by performing experiments in which they measure some key quantities, and then decide whether the results of their measurements are in accordance with the proposed theory. You must also be able to repeat such experiments. When it comes to expert systems and neural nets, many authors tend to make all kinds of extravagant claims that are usually not supported by any objective measurements. For example, they claim that a given expert system has achieved the capability of a human expert. This would, presumably, mean that the expert system is as complex as those components of the human brain that are the material basis for the human expertise, unless the expert system is designed more efficiently than the human brain. When I worked as a consultant at AT&T Bell Laboratories, I did some research on the measuring of the complexity of biological and artificial information systems. I was surprised by how much work has already been done in this area. I came to the conclusion that you can use entropy calculations to determine the MAXIMUM amount of information that can be stored in a particular information system, i.e., its maximum information capacity. For example, I calculated the maximum amount of information that can be stored in a fused human egg cell because that egg cell contains a blueprint for a human being. One way of measuring the complexity of any system is to find out how much information is needed to completely describe a blueprint for that system, and, for example, transmit it to a nearby star. The instructions for building all the proteins in a human body are stored in DNA molecules in the nucleus of the egg cell. The information is encoded by using an alphabet that consists of just four characters A (adenine), C (cytosine), G (guanine), and T (thymine). To calculate the maximum information capacity of a DNA DNA molecule, you have to assume that the distribution of bases is completely random, i.e., that there are no restrictions imposed on the order in which bases can appear in the DNA chain. The distribution of bases in real cells is not random, and, therefore, the actual information capacities of such cells are less than the potential maxima. The maximum capacity of a fused human egg cell is several gigabits. Once the genes of a human being are completely mapped, we will be able to calculate exactly the information capacity of those genes. In the meantime, we can calculate the maximum possible capacity of human genes. That is better than not having an objective measurement, and it has a number of precedents in mathematics. Mathematicians can sometimes determine the limits within the values of a particular function must be, but they may not be able to calculate the actual function values. I also came to the conclusion that the information capacity of an information system depends on the number of internal states that are actually used to store information. For example, you can view a stone as a system with just one internal state (0 bits of information) if you do not use its atoms to store information. You can view an electrical relay that can be closed or open as a system with just two internal states (1 bit of information) if you use the open state to store a 0 and the closed state to store a 1, etc. You can calculate the maximum information capacity of an expert system by taking into account the total number of characters and the number of characters in the alphabet. Researchers have calculated the information capacities of viruses, insects, mammals, etc. It is interesting that the ordering of biological systems by their information capacity corresponds very closely with the usual ordering of biological systems into lower (less complex) and higher (more complex) species. After you calculate the information capacity of different artificial and biological systems, you can present your results in the form of a diagram. relay expert system | | information V V capacity in bits 0 ++--...---+--------+---------------------------------...---+--------------> ^ ^ ^ | | | stone virus human egg cell I came to the conclusion that an artificial system could achieve the same level of intelligence (information processing) as a human being with a smaller number of internal states if its design was more optimal than the design of the human being. Unfortunately, over several billion years nature has used genetic algorithms to produce biological information systems of incredible complexity and optimization. The level of miniaturization used in DNA is about four orders of magnitude greater than the level of miniaturization used in integrated circuits with a 1-micron geometry. British researchers could not figure out how the information for producing all the proteins needed to construct a virus could be encoded in just a couple of thousand bases. The information stored in DNA is read in triplets. They discovered that the code for one protein started at a certain location on the DNA chain. The code for another protein started on the next base, i.e., the codes overlapped. I came to the conclusion that an artificial intelligence system that would be at the same intellectual level as a human being would have to be extremely complex even if its design is highly optimized. We will need a very large number of internal states to implement an artificial subsystem that would imitate the capabilities of, for example, a human eye simply because of the large amount of information that must be processed in real time. I do not believe that such complex systems can be produced by conventional coding of computer systems. I think that we will have to use automatic, self-organizing methods based on genetic algorithms to automatically and gradually develop more and more complex artificial systems until we eventually achieve and surpass the complexity and intellectual capability of human beings. An artificial information system will have a different internal model of the outside world than a human being, and a totally alien set of values, unless such a system has the shape of a human baby and goes through the experiences of adolescence, adulthood, and old age, or we control the information it receives from the outside world in such a way that it gets the impression that it is a human baby and that it is going through the experiences of a human being. I am not saying that we should try to create such an artificial system. I am just saying that an immobile artificial system that does not have the shape of a human body, and that receives information from the outside world through some sensors that have no relationship to human senses, cannot possibly develop an intellect similar to ours. We are more likely to develop an alien race of artifical beings than to develop artificial beings which cannot be distinguished from human beings as described in Isaac Asimov's novels on robots. If you are interested in this topic, I can post detailed entropy calculations and references to relevant books and articles.