Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!aplcen!jhunix!ins_atge
From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards)
Newsgroups: comp.ai
Subject: Re: What Has Traditional AI Accomplished?
Summary: Neural Nets and Traditional AI
Message-ID: <6664@jhunix.HCF.JHU.EDU>
Date: 19 Oct 90 01:15:44 GMT
References: <69609@lll-winken.LLNL.GOV> <1990Oct15.143325.26044@unislc.uucp> <1990Oct16.135631.6444@cbnewsj.att.com>
Organization: The Johns Hopkins University - HCF
Lines: 70

In article <1990Oct16.135631.6444@cbnewsj.att.com> jwi@cbnewsj.att.com (Jim Winer @ AT&T, Middletown, NJ) writes:
>Keith L. Breinholt writes:
>
>| Someone correct me if I'm wrong, I though Neural Nets as an area of
>| study was only 5 or so years old.  In terms of research, 5 years is
>| baby technology.  If Neural Nets are consistent with other research it
>| won't make it into general public acceptance for another 5 to 10
>| years.
>
>I worked on the Mark I Perceptron (Rosenblatt model) in 1959 
>at Cornel Aeronautical Laboratories, Inc. (defunct) under contract
>to Office of Naval Research (ONR). That makes the field at least
>30 years old. Neural Nets have been inconvenient to work with until 
>recently when specialized hardware has become available.

Actually, the death of neural nets in the late sixties and the rebirth of
them a few years ago is a complex story.  Adalines, Perceptrons, and 
similar two-layer neural systems were developed, and actually proved
useful in limited was for signal processing.  The big limitation was
that with two feedforward layers of step-function or sigmoidal activation
functions, mappings from input to output could only be developed which
include areas divided by a single curve in the input space (i.e. 
functions like exclusive-OR could not be represented by the structure).
It was fairly obvious from very early neural models that "hidden layers,"
were required between the input and output neural layers.  
  Now, the perceptron learning rule was developed by agreeing on an error
function to be minimized (usually the sum of squares of differences between
actual outputs and desired outputs).  Training was done by moving along
the negative gradient of this error function, thus (usually) minimizing it.
However, while it is fairly obvious how to differentiate the error function
for a two-layer net, no one could work out how to differentiate the
error function for multiple layers.  Marvin Minsky made some comments on
the difficulty of this in _Perceptrons_, and alot of people lost interest
in these models.
   Eventually someone worked out how to find the error function gradient
for multiple layer networks.  It really isn't that hard to do, and I
don't understand what was so difficult about it.  I guess the difficult
concept was passing error back from the output layer to the hidden layer,
and prudent use of the chain rule.  Really, I wonder why it took so long
to work out.  Actually, I have a feeling some people did work it out in
the seventies, but after _Perceptrons_ perhaps people were just turned off
by NNs.  
  Finally with the publication of _Parallel_Distributed_Processing_,
everyone saw how easy it was to program a multi-layer perceptron,
and other NN structures such as Boltzman Machines.  At first, however,
mathematical failure of NN researchers #2 happened:  fixed step size
gradient descent wass used.  Anyone from mathematical sciences can tell
you that this is a silly way to minimize a function, and learning 
speedups of several orders of magnitude can easily be achieved with
conjugate-gradient and other more advanced minimization methods.
Thus people were lead to believe that even for very small problems,
NNs were slow, when infact they really are not.
  Now even recurrent neural networks can be trained, allowing NNs to have
temporal behavior.  
  But NN researchers are beginning to realize that training a big
homogeneous network is not the answer to good learning systems.
Modularlization is required.  Cascade-Correlation is a NN algorithm
which develops feature representations which can best help to reduce
the network error, and then these features are used to minimize the
network error.  It is able to solve many problems which were difficult
for homogenous NNs to solve.
  I see a future where inductive learning by small homogeneous NNs
is used in combination with more traditional AI type goal building.
Cascade-Correlation is a step in that direction.  Divide-and-conquer
of traditional AI is combined with the easy inductive learning of
traditional NNs.  Of course, the trick is to couch this in a
connectionist framework to continue to allow for fast parallel 
computation.

-Thomas Edwards