Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!sun-barr!newstop!sun!imagen!daemon
From: ib@apolling (Ivan N. Bach)
Newsgroups: comp.ai.neural-nets
Subject: Data Complexity
Message-ID: <4522@imagen.UUCP>
Date: 12 Oct 89 16:24:47 GMT
Sender: daemon@imagen.UUCP
Lines: 106

Many scientists believe that you cannot accept something as a science unless you
can measure it in some way.  Scientists often prove or disprove their theories by
performing experiments in which they measure some key quantities, and then decide 
whether the results of their measurements are in accordance with the proposed theory.
You must also be able to repeat such experiments.

When it comes to expert systems and neural nets, many authors tend to make all
kinds of extravagant claims that are usually not supported by any objective measurements.
For example, they claim that a given expert system has achieved the capability of
a human expert.  This would, presumably, mean that the expert system is as complex as
those components of the human brain that are the material basis for the human
expertise, unless the expert system is designed more efficiently than the human brain.

When I worked as a consultant at AT&T Bell Laboratories, I did some research on the
measuring of the complexity of biological and artificial information systems.  I was
surprised by how much work has already been done in this area.  I came to the conclusion
that you can use entropy calculations to determine the MAXIMUM amount of information that
can be stored in a particular information system, i.e., its maximum information capacity.

For example, I calculated the maximum amount of information that can be stored in a
fused human egg cell because that egg cell contains a blueprint for a human being.
One way of measuring the complexity of any system is to find out how much information
is needed to completely describe a blueprint for that system, and, for example, transmit
it to a nearby star.  The instructions for building all the proteins in a human body
are stored in DNA molecules in the nucleus of the egg cell.  The information is encoded
by using an alphabet that consists of just four characters A (adenine), C (cytosine),
G (guanine), and T (thymine).  To calculate the maximum information capacity of a DNA
DNA molecule, you have to assume that the distribution of bases is completely random,
i.e., that there are no restrictions imposed on the order in which bases can appear
in the DNA chain.  The distribution of bases in real cells is not random, and, therefore,
the actual information capacities of such cells are less than the potential maxima.
The maximum capacity of a fused human egg cell is several gigabits.

Once the genes of a human being are completely mapped, we will be able to calculate
exactly the information capacity of those genes.  In the meantime, we can calculate
the maximum possible capacity of human genes.  That is better than not having an
objective measurement, and it has a number of precedents in mathematics.  Mathematicians
can sometimes determine the limits within the values of a particular function must be, 
but they may not be able to calculate the actual function values.
   
I also came to the conclusion that the information capacity of an information system
depends on the number of internal states that are actually used to store information.
For example, you can view a stone as a system with just one internal state (0 bits of
information) if you do not use its atoms to store information.  You can view an electrical
relay that can be closed or open as a system with just two internal states (1 bit of
information) if you use the open state to store a 0 and the closed state to store a 1,
etc.  You can calculate the maximum information capacity of an expert system by taking
into account the total number of characters and the number of characters in the alphabet.

Researchers have calculated the information capacities of viruses, insects, mammals, etc.
It is interesting that the ordering of biological systems by their information capacity
corresponds very closely with the usual ordering of biological systems into lower (less
complex) and higher (more complex) species.  After you calculate the information capacity
of different artificial and biological systems, you can present your results in the form
of a diagram.


        relay           expert system 
          |                 |                                           information
          V                 V                                           capacity in bits
       0 ++--...---+--------+---------------------------------...---+-------------->
         ^         ^                                                ^ 
         |         |                                                |
       stone    virus                                         human egg cell


I came to the conclusion that an artificial system could achieve the same level of
intelligence (information processing) as a human being with a smaller number of
internal states if its design was more optimal than the design of the human being. 
Unfortunately, over several billion years nature has used genetic algorithms to
produce biological information systems of incredible complexity and optimization.
The level of miniaturization used in DNA is about four orders of magnitude greater than 
the level of miniaturization used in integrated circuits with a 1-micron geometry.  

British researchers could not figure out how the information for producing all the
proteins needed to construct a virus could be encoded in just a couple of thousand
bases.  The information stored in DNA is read in triplets.  They discovered that the
code for one protein started at a certain location on the DNA chain.  The code for
another protein started on the next base, i.e., the codes overlapped.

I came to the conclusion that an artificial intelligence system that would be at
the same intellectual level as a human being would have to be extremely complex even
if its design is highly optimized.  We will need a very large number of internal states
to implement an artificial subsystem that would imitate the capabilities of, for
example, a human eye simply because of the large amount of information that must be
processed in real time.  I do not believe that such complex systems can be produced
by conventional coding of computer systems.  I think that we will have to use automatic,
self-organizing methods based on genetic algorithms to automatically and gradually develop 
more and more complex artificial systems until we eventually achieve and surpass the 
complexity and intellectual capability of human beings.

An artificial information system will have a different internal model of the
outside world than a human being, and a totally alien set of values, unless such a
system has the shape of a human baby and goes through the experiences of adolescence,
adulthood, and old age, or we control the information it receives from the outside world
in such a way that it gets the impression that it is a human baby and that it is going
through the experiences of a human being.  I am not saying that we should try to create
such an artificial system.  I am just saying that an immobile artificial system that
does not have the shape of a human body, and that receives information from the outside
world through some sensors that have no relationship to human senses, cannot possibly
develop an intellect similar to ours.  We are more likely to develop an alien race
of artifical beings than to develop artificial beings which cannot be distinguished from
human beings as described in Isaac Asimov's novels on robots.  

If you are interested in this topic, I can post detailed entropy calculations and 
references to relevant books and articles.