Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!ames!amdcad!sun!imagen!daemon
From: ib@apolling (Ivan N. Bach)
Newsgroups: comp.ai.neural-nets
Subject: Data Complexity
Message-ID: <4542@imagen.UUCP>
Date: 15 Oct 89 22:08:08 GMT
Sender: daemon@imagen.UUCP
Lines: 344

How can we measure the capacity to store and transmit information?  Claude Shannon,
founder of the theory of information, suggested that the capacity of a system
to store and transmit information can be determined by calculating the degree of
randomness or disorganization in the system [1].  He decided to call this measure
"entropy," because he realized that his formula:

     H = -K x sum of (pi x log pi) for all i=1 to i=a

(where "H" is the entropy of a system with "a" possible states or events, "K" is
a constant, and "pi" is the probability of occurrence of the i-th state or event)
was essentially the same as the formula that is used in statistical thermodynamics
to calculate the entropy or disorganization of a thermodynamic system.

It can be proved mathematically [2] that the entropy of a system will be at its 
maximum when all system states or events are equally probable and independent,
i.e., when the probability of each event in a system with "a" states or events
is equal to 1/a, and the probability of an event is independent from the probabilities
of all other events.  This maximum entropy is equal to log a.

When we set the constant "K" to 1, and use base 2 to calculate logarithms, the unit
of entropy is called a "bit."

One of the best books on the use of Shannon's entropy formula to calculate the
information capacity of biological information systems is the book "Information
Theory and The Living System" written by Lila Gatlin [3]. 

If a system has only one state, the probability of occurrence of this state is
equal to 1.  The entropy of such a system is equal to zero.

The entropy of an information system with 2 states will be maximum when the
probability of each event is equal to 1/2.  This follows from the basic requirement
that the sum of all probabilities of all possible events must always be equal to 1.
The entropy of such a system is:

    Hmax = -(p1 x log2 of p1 + p2 x log2 of p2) bits
         = -(1/2 x log2 of 1/2 + 1/2 x log2 of 1/2) bits
         = -(log2 of 1/2) bits
         = log2 of 2 bits    (because -log2 of 1/a = log2 of a)
         = 1 bit 

Computer programmers usually think of a single-bit variable as a variable which can
have either value 0 or value 1.  The maximum entropy or information capacity of a 
single-bit variable is 1 bit.

Lila Gatlin showed that Claude Shannon's formula could be used to calculate the
entropy of a randomly chosen base in the DNA chain of bases in the nucleus of a living
cell.  If all four types of bases in a DNA chain are equiprobable, the entropy of a
randomly chosen base in the DNA chain is equal to:

    Hmax = -log2 of 1/4 bits
         = log2 of 4 bits
         = 2 bits

Computer programmers know that they have to use at least 2 bits to encode 4 different
values.  The maximum entropy for a randomly chosen symbol from a sequence of symbols
selected from an alphabet of "a" symbols (a=4 for DNA) is equal to log2 of "a" bits.

If all bases in a DNA chain are not equiprobable, the probability of a certain type
of base occurring at a random position in a DNA chain is proportional to the number of
bases of its type in the whole chain, i.e., it depends on the composition of DNA.
All bases in the DNA of "Escherichia coli" bacteria are equiprobable.  Bases in the 
DNA of "Micrococcus lysodeikticus" appear with the following probabilities:

    p(A)=p(T)=.145 and p(C)=p(G)=.355

because adenine (A) bonds only to thymine (T), and cytosine (C) bonds only to guaning
(G).  Therefore, the entropy of a randomly chosen base in the DNA chain of "Micrococcus
lysodeikticus" is equal to:

    H = -(.145 x log2 of .145 + .145 x log2 of .145 + .355 x log2 of .355 + 
        .355 x log2 of .355) bits 
      = 1.87 bits

Notice that this value is smaller than the maximum value of 2 bits for equiprobable 
bases. 

Lila Gatlin came to the conclusion that the entropy of a complex system is equal to
the sum of entropies of its components.  

The number of base pairs per sex or germ cell ranges from 10,000 for bacteria to over 
1,000,000,000 for mammals.  If we assume equiprobable bases, the maximum total entropy
of the DNA chain in a human sex cell is about 5.8 gigabits (2.9 billion nucleotide pairs
x 2 bits per nucleotide pair).  When a sperm cell fuses with an egg cell, the maximum
entropy of the DNA chain in the nucleus of the fertilized cell is equal to 11.6 gigabits.  

Notice that the DNA in a fertilized egg cell contains just a "blueprint" for a human body.  
Dancoff and Quastler [4] [5] tried to estimate very roughly the entropy of an adult human
body by calculating the amount of information required to specify the type, position, and
orientation of atoms in the body.  There are probably about 60 different types of atoms in
the human body.  Assuming 64 different atoms which appear with equal frequency, the 
maximum entropy of a randomly selected atom is equal to 6 bits.  However, not all types
of atoms occur with the same frequency.  Only three types of atoms, carbon, hydrogen and  
oxygen, are present in significant amounts.  Their approximate relative frequencies in
the human body are 0.66, 0.22 and 0.07, respectively.  The frequency of all other types
of atoms is about 0.05.  Therefore, the actual entropy of a human body is about 1.4 bits
per atom.  About 23 more bits are needed to specify proper orientation of each atom.  That
brings the total entropy per atom to about 24.4 bits.  It is not necessary to specify 
that many bits per each atom, because more than 90% of body's weight consists of water
and other comparatively featureless substances.  Therefore, the total entropy of a
human body is:

    H = 24.4 bits x 10% of 7x10 to the power of 27 (the total number of atoms in the body)
      = 2x10 to the power of 28 bits (or 10 to the power of 24 pages of English text)

Similar astronomical figures were calculated in different ways by Linschitz [6] and
Morowitz [7].

What I find fascinating is that the information needed to specify the blueprint for a 
human body is so efficiently compressed that it can be placed into the nucleus of each 
of about 50 trillion cells in that body.  Each sex cell contains only half a blueprint.  
Each non-sex cell contains all the information needed to build every type of protein in 
a human body.  Which types of proteins are actually built depends on the position of
biological switches.

Several valuable lessons relevant to the design of extremely complex neural networks
can be deduced from the studies of biological systems.

1. The storage of a neural network is usually measured in the number of interconnects.
   If you can calculate the number of different states of a neural network, you can
   calculate its maximum information capacity by assuming that all states are
   equiprobable.  If the probability of occurrence of each state is either known or
   can be calculated, you can calculate the actual information capacity of your 
   network.

   Once you can calculate the capacity of different networks, you can objectively
   compare them, and determine which network represents the most efficient way of
   storing information.

   In Fig. 2.13 on page 33 of the DARPA Neural Network Study [27], you can see
   a comparison of the storage and speed of a leech, worm, fly, etc.  It is
   estimated that a human brain has the storage capacity of 10 to the power
   of 14 interconnects, and the speed of 10 to the power of 16 interconnects per
   second. 

2. Many researchers are trying to find analytical methods for designing optimal neural
   networks.  I suggest that we use genetic algorithms to design extremely complex and
   optimized neural networks automatically.  Genetic algorithms require an objective
   function or a goal.  We can specify the minimum number of neural nodes or the minimum
   number of layers in a neural network as our goal.  We can then use a random process
   to generate generations of neural nets.  We can use the objective function to
   evaluate members of each generation, and discard those members which do not achieve
   a good enough result with respect to the objective function.  We can then cross
   the most successful members in each generation until we get an optimal network.
 
   Research has shown that application of a priori knowledge of which areas of the 
   domain of a complex objective function should be explored usually results in
   suboptimal results, i.e., in the finding of a local optimum.  Genetic algorithms  
   based on random processes, thoroughly but not exhaustively explore all areas
   of the objetive function's domain, and usually result in the finding of an overall 
   or global optimum [7][8].

   I have come to the conclusion that very intelligent systems must be very
   complex.  We cannot use conventional computer programming methods to specify
   each detail of an extremely complex information system.  If we restrict ourselves
   to only analytical methods of constructing neural networks, we will not
   be able to construct networks that will be complex enough to have the information
   capacity equal to or greater than the capacity of a human brain.

3. The methods used to construct human neural networks are totally different from
   the methods which are now used to construct artificial neural networks and design
   complex VLSI circuits.  The technology of biological systems can be viewed as
   an extremely advanced and alien technology.  To not take advantage of this
   technology would be like coming across a UFO or a planet with very advanced 
   technology, and passing up the opportunity to study and apply that technology.

   The Turing machine is an abstract automaton with a head that is used to read
   instructions from an abstract tape.  Most computer scientists think that the
   Turing machine is used to prove theorems in the theory of automata, and that
   such a mechanism should not be implemented because there is no practical use
   for it.  If you compare the diagram of a Turing machine in a textbook
   on the theory of automata with the diagram of a ribosome in a textbook on
   genetics or molecular biology, you will notice remarkable similarities.
  
   A ribosome in the cytoplam of a cell has a head which is used to read the
   instructions for the building of proteins which are stored on a "tape"
   called messenger RNA.  Since ribosomes (the work stations for the manufacturing
   of proteins) are located outside the nucleus of a cell, and the DNA molecules
   are safely stored in the nucleus, messenger RNA molecules are needed to copy
   the instructions for building proteins from DNA molecules, and carry them to
   ribosomes.

   I would not be surprised if our colleagues who are reading the "sci.nanotech"
   newsgroup expressed an interest in such a mechanism.  There must be a number
   of other ingenious mechanisms in our cells that are just waiting to be
   discovered.

4. Instead of constructing a full-blown neural network, we could specify a
   "blueprint" for contructing that network.  I think that the amount of information
   in the blueprint can always be smaller than the amount of information in the
   network if there is any redundancy or repetition in the structure of the
   network.  If a very complex network is used to store large amounts of
   similar types of information, whole sections of that network should have
   identical or very similar structure.  There is no reason why we should use
   a different structure each time we want to store a data item of a certain
   type.  If whole areas of the network are going to have the same structure,
   we could specify in the blueprint that a certain number of subnetworks of
   a given type is needed in a particular location, and we may not care how each
   detail of each subnetwork is going to be constructed.  I suspect that Mother
   Nature had coded recursive subroutine calls into our DNA molecules long time 
   before computer programmers rediscovered such calls.

5. If you use a certain "alphabet" of components to construct your neural net,
   you should use each type of component with the same frequency if you want
   achieve the maximum information capacity.

Main references for the artificial evolution and genetic algorithms seem to be: 

   A. Turing (1936-1952) [8]
   Turing machines, the chemical basis of morphogenesis.

   J. von Neumann (1952-53) [9]
   Reproduction and evolution of automata.

   L. Fogel, A. Owens, and M. Walsh (1966) [10]
   Artificial evolution through simulated evolution.

   J. Bagley (1967) [11]
   Behavior of adaptive systems.

   R. Rosenberg (1967) [12]
   Relationship between genotype and phenotype.

   R. Weinberger (1970) [13]
   Computer simulation of a living cell.

   E. Goodman (1972) [14]  
   Adaptive behavior of simulated bacterial cells.
 
   N. Foo and J. Bosworth (1972) [15]
   Various aspects of genetic operators.

   D. Frantz (1972) [16]
   Non-linearities in genetic adaptive research.

   J. Holland (1962,  1975 and 1976) [17][18][19]
   Theory of adaptive systems and spontaneous emergence of self-reproducing
   automata.
 
   K. DeJong, A. Brindle, and A. Bethke (1975-81) [20]
   Genetic algorithms and function optimization.

   S. Smith (1980) [21]
   Learning systems based on genetic algorithms.

   J. Grefenstette (1983) [22]
   Optimization of genetic search algorithms.

   M. Mauldin (1984) [23]
   Maintaining diversity in genetic search.

   A. Goldberg and I. Pohl (1984) [24]
   Complexity theory and AI research.

   D. Goldberg (1985, 1989) [25][26]
   Application of genetic algorithms.

REFERENCES

1. Shannon, C. E., "A Mathematical Theory of Communication," "Bell System Technical
   Journal," Vol. 27, pp. 379-423 and 623-656.  (Reprinted in C. E. Shannon and W.
   Weaver, "The Mathematical Theory of Communication," pp. 3-91, U. of Illinois
   Press, Urbana, 1949.)

2. Khinchin, A. I., "Mathematical Foundations of Information Theory," Dover, New York,
   1957. 

3. Gatlin, Lila, "Information Theory and The Living System," Columbia University Press,
   New York, 1972.

4. Dancoff, S. M. and H. Quastler, "The Information Content and Error Rate of Living
   Things," in "Information Theory in Biology" edited by H. Quastler, U. of Illinois Press,
   Urbana, 1953.

5. Calow, Peter, "Biological Machines: A Cybernetic Approach to Life," Edward Arnold
   Publishing Co., London, England, 1976.

6. Linschitz, H., "Information Content of a Bacterial Cell" in "Information Theory in
   Biology," edited by H. Quastler, U. of Illinois Press, Urbana, 1953.

7. Morowitz, H. J., "Bull. Maths. Biophys.," Vol. 17, 1955, pp. 81-86. 

8. Turing, A. M., "The Chemical Basis of Morphogenesis," "Phil. Trans. Roy. Soc.," ser.
   B, No. 237, London, 1952.

9. Von Neumann, John, a lecture at the U. of Illinois published in: Arthur W. Burks, ed.,
   "Theory of Self-Reproducing Automata," U. of Illinois Press, Urbana, 1966.

10. Fogel, L. Owens et al., "Artificial Intelligence Through Simulated Evolution," 
    John Wiley and Sons, Inc., New York, 1966.

11. Bagley, J. D., "The Behavior of Adaptive Systems Which Employ Genetic and
    Correlation Algorithms," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967.

12. Rosenberg, R. S., "Simulation of Genetic Populations with Biochemical Properties,"
    Ph.D. thesis, U. of Michigan, Ann Arbor, 1967.

13. Weinberger, R., "Computer Simulation of a Living Cell," Ph.D. thesis, U. of 
    Michigan, Ann Arbor, 1970.

14. Goodman, E. D., "Adaptive Behavior of Simulated Bacterial Cells Subjected to 
    Nutritional Shifts," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972.

15. Foo, N. Y. and J. L. Bosworth, "Algebraic, geometric, and stochastic aspects of
    genetic operators," Tech. Rep. 003120-2-T, U. of Michigan, Ann Arbor, 1972.

16. Frantz, D. R., "Non-linearities in Genetic Adaptive Search," Ph.D. thesis,
    U. of Michigan, Ann Arbor, 1972.

l7. Holland, J. H. "Concerning Efficient Adaptive Systems," pp. 215-230 of "Self-
    Organizing Systems," edited by M. C. Yovits, G. T. Jacobi, and G. J. Goldstein,
    Spartan Books, Washington, D.C., 1962.

18. Holland, J. H., "Outline for a Logical Theory of Adaptive Systems," Journal
    of the Assoc. for Computing Machinery, Vol. 9, No. 3, July 1962, pp. 297-314.

19. Holland, J. H., "Studies of the Spontaneous Emergence of Self-Replicating
    Systems Using Cellular Automata and Formal Grammars," pp. 385-404 from "Automata,
    Languages, Development," edited by Aristid Lindenmayer and Grzegorz Rosenberg,
    North-Holland Publishing Company, Amsterdam, 1976. 

20. DeJong, K. A., "Analysis of the Behavior of a Class of Genetic Adaptive
    Systems," Ph.D. thesis, Dept. of Computer and Communications Sciences,
    U. of Michigan, Ann Arbor, 1975.

21. Smith, S., "A Learning System Based on Genetic Adaptive Algorithms," Ph.D.
    thesis, U. of Pittsburgh, Pa., 1980.

22. Grefenstette, J. G., "Optimization of Genetic Search Algorithms," Tech. Rep.
    CS-83-14, Vanderbilt U., 1983.

23. Mauldin, L. Michael, "Maintaining Diversity in Genetic Search," Proc. of
    the 1984 National Conference on Artificial Intelligence, pp. 247-250.

24. Goldberg, Allen and Ira Pohl, "Is Complexity Theory of Use to AI?," pp.
    43-55 from "Artificial Human Intelligence," edited by A. Elithorn and R.
    Banerji, Elsevier Science Publishers, Amsterdam, 1984. 

25. Goldberg, David, "Dynamic System Control Using Rule Learning and Genetic
    Algorithms," Proc., of the 1985 National Conference on AI, pp. 588-592. 

26. Goldberg, E. David, "Genetic Algorithms In Search, Optimization and Machine
    Learning," Addison-Wesley Publishing Company, Inc., Reading, 1989.

27. "DARPA Neural Network Study," AFCEA International Press, Fairfax, 1988.