Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!ames!amdcad!sun!imagen!daemon From: ib@apolling (Ivan N. Bach) Newsgroups: comp.ai.neural-nets Subject: Data Complexity Message-ID: <4542@imagen.UUCP> Date: 15 Oct 89 22:08:08 GMT Sender: daemon@imagen.UUCP Lines: 344 How can we measure the capacity to store and transmit information? Claude Shannon, founder of the theory of information, suggested that the capacity of a system to store and transmit information can be determined by calculating the degree of randomness or disorganization in the system [1]. He decided to call this measure "entropy," because he realized that his formula: H = -K x sum of (pi x log pi) for all i=1 to i=a (where "H" is the entropy of a system with "a" possible states or events, "K" is a constant, and "pi" is the probability of occurrence of the i-th state or event) was essentially the same as the formula that is used in statistical thermodynamics to calculate the entropy or disorganization of a thermodynamic system. It can be proved mathematically [2] that the entropy of a system will be at its maximum when all system states or events are equally probable and independent, i.e., when the probability of each event in a system with "a" states or events is equal to 1/a, and the probability of an event is independent from the probabilities of all other events. This maximum entropy is equal to log a. When we set the constant "K" to 1, and use base 2 to calculate logarithms, the unit of entropy is called a "bit." One of the best books on the use of Shannon's entropy formula to calculate the information capacity of biological information systems is the book "Information Theory and The Living System" written by Lila Gatlin [3]. If a system has only one state, the probability of occurrence of this state is equal to 1. The entropy of such a system is equal to zero. The entropy of an information system with 2 states will be maximum when the probability of each event is equal to 1/2. This follows from the basic requirement that the sum of all probabilities of all possible events must always be equal to 1. The entropy of such a system is: Hmax = -(p1 x log2 of p1 + p2 x log2 of p2) bits = -(1/2 x log2 of 1/2 + 1/2 x log2 of 1/2) bits = -(log2 of 1/2) bits = log2 of 2 bits (because -log2 of 1/a = log2 of a) = 1 bit Computer programmers usually think of a single-bit variable as a variable which can have either value 0 or value 1. The maximum entropy or information capacity of a single-bit variable is 1 bit. Lila Gatlin showed that Claude Shannon's formula could be used to calculate the entropy of a randomly chosen base in the DNA chain of bases in the nucleus of a living cell. If all four types of bases in a DNA chain are equiprobable, the entropy of a randomly chosen base in the DNA chain is equal to: Hmax = -log2 of 1/4 bits = log2 of 4 bits = 2 bits Computer programmers know that they have to use at least 2 bits to encode 4 different values. The maximum entropy for a randomly chosen symbol from a sequence of symbols selected from an alphabet of "a" symbols (a=4 for DNA) is equal to log2 of "a" bits. If all bases in a DNA chain are not equiprobable, the probability of a certain type of base occurring at a random position in a DNA chain is proportional to the number of bases of its type in the whole chain, i.e., it depends on the composition of DNA. All bases in the DNA of "Escherichia coli" bacteria are equiprobable. Bases in the DNA of "Micrococcus lysodeikticus" appear with the following probabilities: p(A)=p(T)=.145 and p(C)=p(G)=.355 because adenine (A) bonds only to thymine (T), and cytosine (C) bonds only to guaning (G). Therefore, the entropy of a randomly chosen base in the DNA chain of "Micrococcus lysodeikticus" is equal to: H = -(.145 x log2 of .145 + .145 x log2 of .145 + .355 x log2 of .355 + .355 x log2 of .355) bits = 1.87 bits Notice that this value is smaller than the maximum value of 2 bits for equiprobable bases. Lila Gatlin came to the conclusion that the entropy of a complex system is equal to the sum of entropies of its components. The number of base pairs per sex or germ cell ranges from 10,000 for bacteria to over 1,000,000,000 for mammals. If we assume equiprobable bases, the maximum total entropy of the DNA chain in a human sex cell is about 5.8 gigabits (2.9 billion nucleotide pairs x 2 bits per nucleotide pair). When a sperm cell fuses with an egg cell, the maximum entropy of the DNA chain in the nucleus of the fertilized cell is equal to 11.6 gigabits. Notice that the DNA in a fertilized egg cell contains just a "blueprint" for a human body. Dancoff and Quastler [4] [5] tried to estimate very roughly the entropy of an adult human body by calculating the amount of information required to specify the type, position, and orientation of atoms in the body. There are probably about 60 different types of atoms in the human body. Assuming 64 different atoms which appear with equal frequency, the maximum entropy of a randomly selected atom is equal to 6 bits. However, not all types of atoms occur with the same frequency. Only three types of atoms, carbon, hydrogen and oxygen, are present in significant amounts. Their approximate relative frequencies in the human body are 0.66, 0.22 and 0.07, respectively. The frequency of all other types of atoms is about 0.05. Therefore, the actual entropy of a human body is about 1.4 bits per atom. About 23 more bits are needed to specify proper orientation of each atom. That brings the total entropy per atom to about 24.4 bits. It is not necessary to specify that many bits per each atom, because more than 90% of body's weight consists of water and other comparatively featureless substances. Therefore, the total entropy of a human body is: H = 24.4 bits x 10% of 7x10 to the power of 27 (the total number of atoms in the body) = 2x10 to the power of 28 bits (or 10 to the power of 24 pages of English text) Similar astronomical figures were calculated in different ways by Linschitz [6] and Morowitz [7]. What I find fascinating is that the information needed to specify the blueprint for a human body is so efficiently compressed that it can be placed into the nucleus of each of about 50 trillion cells in that body. Each sex cell contains only half a blueprint. Each non-sex cell contains all the information needed to build every type of protein in a human body. Which types of proteins are actually built depends on the position of biological switches. Several valuable lessons relevant to the design of extremely complex neural networks can be deduced from the studies of biological systems. 1. The storage of a neural network is usually measured in the number of interconnects. If you can calculate the number of different states of a neural network, you can calculate its maximum information capacity by assuming that all states are equiprobable. If the probability of occurrence of each state is either known or can be calculated, you can calculate the actual information capacity of your network. Once you can calculate the capacity of different networks, you can objectively compare them, and determine which network represents the most efficient way of storing information. In Fig. 2.13 on page 33 of the DARPA Neural Network Study [27], you can see a comparison of the storage and speed of a leech, worm, fly, etc. It is estimated that a human brain has the storage capacity of 10 to the power of 14 interconnects, and the speed of 10 to the power of 16 interconnects per second. 2. Many researchers are trying to find analytical methods for designing optimal neural networks. I suggest that we use genetic algorithms to design extremely complex and optimized neural networks automatically. Genetic algorithms require an objective function or a goal. We can specify the minimum number of neural nodes or the minimum number of layers in a neural network as our goal. We can then use a random process to generate generations of neural nets. We can use the objective function to evaluate members of each generation, and discard those members which do not achieve a good enough result with respect to the objective function. We can then cross the most successful members in each generation until we get an optimal network. Research has shown that application of a priori knowledge of which areas of the domain of a complex objective function should be explored usually results in suboptimal results, i.e., in the finding of a local optimum. Genetic algorithms based on random processes, thoroughly but not exhaustively explore all areas of the objetive function's domain, and usually result in the finding of an overall or global optimum [7][8]. I have come to the conclusion that very intelligent systems must be very complex. We cannot use conventional computer programming methods to specify each detail of an extremely complex information system. If we restrict ourselves to only analytical methods of constructing neural networks, we will not be able to construct networks that will be complex enough to have the information capacity equal to or greater than the capacity of a human brain. 3. The methods used to construct human neural networks are totally different from the methods which are now used to construct artificial neural networks and design complex VLSI circuits. The technology of biological systems can be viewed as an extremely advanced and alien technology. To not take advantage of this technology would be like coming across a UFO or a planet with very advanced technology, and passing up the opportunity to study and apply that technology. The Turing machine is an abstract automaton with a head that is used to read instructions from an abstract tape. Most computer scientists think that the Turing machine is used to prove theorems in the theory of automata, and that such a mechanism should not be implemented because there is no practical use for it. If you compare the diagram of a Turing machine in a textbook on the theory of automata with the diagram of a ribosome in a textbook on genetics or molecular biology, you will notice remarkable similarities. A ribosome in the cytoplam of a cell has a head which is used to read the instructions for the building of proteins which are stored on a "tape" called messenger RNA. Since ribosomes (the work stations for the manufacturing of proteins) are located outside the nucleus of a cell, and the DNA molecules are safely stored in the nucleus, messenger RNA molecules are needed to copy the instructions for building proteins from DNA molecules, and carry them to ribosomes. I would not be surprised if our colleagues who are reading the "sci.nanotech" newsgroup expressed an interest in such a mechanism. There must be a number of other ingenious mechanisms in our cells that are just waiting to be discovered. 4. Instead of constructing a full-blown neural network, we could specify a "blueprint" for contructing that network. I think that the amount of information in the blueprint can always be smaller than the amount of information in the network if there is any redundancy or repetition in the structure of the network. If a very complex network is used to store large amounts of similar types of information, whole sections of that network should have identical or very similar structure. There is no reason why we should use a different structure each time we want to store a data item of a certain type. If whole areas of the network are going to have the same structure, we could specify in the blueprint that a certain number of subnetworks of a given type is needed in a particular location, and we may not care how each detail of each subnetwork is going to be constructed. I suspect that Mother Nature had coded recursive subroutine calls into our DNA molecules long time before computer programmers rediscovered such calls. 5. If you use a certain "alphabet" of components to construct your neural net, you should use each type of component with the same frequency if you want achieve the maximum information capacity. Main references for the artificial evolution and genetic algorithms seem to be: A. Turing (1936-1952) [8] Turing machines, the chemical basis of morphogenesis. J. von Neumann (1952-53) [9] Reproduction and evolution of automata. L. Fogel, A. Owens, and M. Walsh (1966) [10] Artificial evolution through simulated evolution. J. Bagley (1967) [11] Behavior of adaptive systems. R. Rosenberg (1967) [12] Relationship between genotype and phenotype. R. Weinberger (1970) [13] Computer simulation of a living cell. E. Goodman (1972) [14] Adaptive behavior of simulated bacterial cells. N. Foo and J. Bosworth (1972) [15] Various aspects of genetic operators. D. Frantz (1972) [16] Non-linearities in genetic adaptive research. J. Holland (1962, 1975 and 1976) [17][18][19] Theory of adaptive systems and spontaneous emergence of self-reproducing automata. K. DeJong, A. Brindle, and A. Bethke (1975-81) [20] Genetic algorithms and function optimization. S. Smith (1980) [21] Learning systems based on genetic algorithms. J. Grefenstette (1983) [22] Optimization of genetic search algorithms. M. Mauldin (1984) [23] Maintaining diversity in genetic search. A. Goldberg and I. Pohl (1984) [24] Complexity theory and AI research. D. Goldberg (1985, 1989) [25][26] Application of genetic algorithms. REFERENCES 1. Shannon, C. E., "A Mathematical Theory of Communication," "Bell System Technical Journal," Vol. 27, pp. 379-423 and 623-656. (Reprinted in C. E. Shannon and W. Weaver, "The Mathematical Theory of Communication," pp. 3-91, U. of Illinois Press, Urbana, 1949.) 2. Khinchin, A. I., "Mathematical Foundations of Information Theory," Dover, New York, 1957. 3. Gatlin, Lila, "Information Theory and The Living System," Columbia University Press, New York, 1972. 4. Dancoff, S. M. and H. Quastler, "The Information Content and Error Rate of Living Things," in "Information Theory in Biology" edited by H. Quastler, U. of Illinois Press, Urbana, 1953. 5. Calow, Peter, "Biological Machines: A Cybernetic Approach to Life," Edward Arnold Publishing Co., London, England, 1976. 6. Linschitz, H., "Information Content of a Bacterial Cell" in "Information Theory in Biology," edited by H. Quastler, U. of Illinois Press, Urbana, 1953. 7. Morowitz, H. J., "Bull. Maths. Biophys.," Vol. 17, 1955, pp. 81-86. 8. Turing, A. M., "The Chemical Basis of Morphogenesis," "Phil. Trans. Roy. Soc.," ser. B, No. 237, London, 1952. 9. Von Neumann, John, a lecture at the U. of Illinois published in: Arthur W. Burks, ed., "Theory of Self-Reproducing Automata," U. of Illinois Press, Urbana, 1966. 10. Fogel, L. Owens et al., "Artificial Intelligence Through Simulated Evolution," John Wiley and Sons, Inc., New York, 1966. 11. Bagley, J. D., "The Behavior of Adaptive Systems Which Employ Genetic and Correlation Algorithms," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967. 12. Rosenberg, R. S., "Simulation of Genetic Populations with Biochemical Properties," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967. 13. Weinberger, R., "Computer Simulation of a Living Cell," Ph.D. thesis, U. of Michigan, Ann Arbor, 1970. 14. Goodman, E. D., "Adaptive Behavior of Simulated Bacterial Cells Subjected to Nutritional Shifts," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972. 15. Foo, N. Y. and J. L. Bosworth, "Algebraic, geometric, and stochastic aspects of genetic operators," Tech. Rep. 003120-2-T, U. of Michigan, Ann Arbor, 1972. 16. Frantz, D. R., "Non-linearities in Genetic Adaptive Search," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972. l7. Holland, J. H. "Concerning Efficient Adaptive Systems," pp. 215-230 of "Self- Organizing Systems," edited by M. C. Yovits, G. T. Jacobi, and G. J. Goldstein, Spartan Books, Washington, D.C., 1962. 18. Holland, J. H., "Outline for a Logical Theory of Adaptive Systems," Journal of the Assoc. for Computing Machinery, Vol. 9, No. 3, July 1962, pp. 297-314. 19. Holland, J. H., "Studies of the Spontaneous Emergence of Self-Replicating Systems Using Cellular Automata and Formal Grammars," pp. 385-404 from "Automata, Languages, Development," edited by Aristid Lindenmayer and Grzegorz Rosenberg, North-Holland Publishing Company, Amsterdam, 1976. 20. DeJong, K. A., "Analysis of the Behavior of a Class of Genetic Adaptive Systems," Ph.D. thesis, Dept. of Computer and Communications Sciences, U. of Michigan, Ann Arbor, 1975. 21. Smith, S., "A Learning System Based on Genetic Adaptive Algorithms," Ph.D. thesis, U. of Pittsburgh, Pa., 1980. 22. Grefenstette, J. G., "Optimization of Genetic Search Algorithms," Tech. Rep. CS-83-14, Vanderbilt U., 1983. 23. Mauldin, L. Michael, "Maintaining Diversity in Genetic Search," Proc. of the 1984 National Conference on Artificial Intelligence, pp. 247-250. 24. Goldberg, Allen and Ira Pohl, "Is Complexity Theory of Use to AI?," pp. 43-55 from "Artificial Human Intelligence," edited by A. Elithorn and R. Banerji, Elsevier Science Publishers, Amsterdam, 1984. 25. Goldberg, David, "Dynamic System Control Using Rule Learning and Genetic Algorithms," Proc., of the 1985 National Conference on AI, pp. 588-592. 26. Goldberg, E. David, "Genetic Algorithms In Search, Optimization and Machine Learning," Addison-Wesley Publishing Company, Inc., Reading, 1989. 27. "DARPA Neural Network Study," AFCEA International Press, Fairfax, 1988.