Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!aplcen!jhunix!ins_atge From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) Newsgroups: comp.ai Subject: Re: What Has Traditional AI Accomplished? Summary: Neural Nets and Traditional AI Message-ID: <6664@jhunix.HCF.JHU.EDU> Date: 19 Oct 90 01:15:44 GMT References: <69609@lll-winken.LLNL.GOV> <1990Oct15.143325.26044@unislc.uucp> <1990Oct16.135631.6444@cbnewsj.att.com> Organization: The Johns Hopkins University - HCF Lines: 70 In article <1990Oct16.135631.6444@cbnewsj.att.com> jwi@cbnewsj.att.com (Jim Winer @ AT&T, Middletown, NJ) writes: >Keith L. Breinholt writes: > >| Someone correct me if I'm wrong, I though Neural Nets as an area of >| study was only 5 or so years old. In terms of research, 5 years is >| baby technology. If Neural Nets are consistent with other research it >| won't make it into general public acceptance for another 5 to 10 >| years. > >I worked on the Mark I Perceptron (Rosenblatt model) in 1959 >at Cornel Aeronautical Laboratories, Inc. (defunct) under contract >to Office of Naval Research (ONR). That makes the field at least >30 years old. Neural Nets have been inconvenient to work with until >recently when specialized hardware has become available. Actually, the death of neural nets in the late sixties and the rebirth of them a few years ago is a complex story. Adalines, Perceptrons, and similar two-layer neural systems were developed, and actually proved useful in limited was for signal processing. The big limitation was that with two feedforward layers of step-function or sigmoidal activation functions, mappings from input to output could only be developed which include areas divided by a single curve in the input space (i.e. functions like exclusive-OR could not be represented by the structure). It was fairly obvious from very early neural models that "hidden layers," were required between the input and output neural layers. Now, the perceptron learning rule was developed by agreeing on an error function to be minimized (usually the sum of squares of differences between actual outputs and desired outputs). Training was done by moving along the negative gradient of this error function, thus (usually) minimizing it. However, while it is fairly obvious how to differentiate the error function for a two-layer net, no one could work out how to differentiate the error function for multiple layers. Marvin Minsky made some comments on the difficulty of this in _Perceptrons_, and alot of people lost interest in these models. Eventually someone worked out how to find the error function gradient for multiple layer networks. It really isn't that hard to do, and I don't understand what was so difficult about it. I guess the difficult concept was passing error back from the output layer to the hidden layer, and prudent use of the chain rule. Really, I wonder why it took so long to work out. Actually, I have a feeling some people did work it out in the seventies, but after _Perceptrons_ perhaps people were just turned off by NNs. Finally with the publication of _Parallel_Distributed_Processing_, everyone saw how easy it was to program a multi-layer perceptron, and other NN structures such as Boltzman Machines. At first, however, mathematical failure of NN researchers #2 happened: fixed step size gradient descent wass used. Anyone from mathematical sciences can tell you that this is a silly way to minimize a function, and learning speedups of several orders of magnitude can easily be achieved with conjugate-gradient and other more advanced minimization methods. Thus people were lead to believe that even for very small problems, NNs were slow, when infact they really are not. Now even recurrent neural networks can be trained, allowing NNs to have temporal behavior. But NN researchers are beginning to realize that training a big homogeneous network is not the answer to good learning systems. Modularlization is required. Cascade-Correlation is a NN algorithm which develops feature representations which can best help to reduce the network error, and then these features are used to minimize the network error. It is able to solve many problems which were difficult for homogenous NNs to solve. I see a future where inductive learning by small homogeneous NNs is used in combination with more traditional AI type goal building. Cascade-Correlation is a step in that direction. Divide-and-conquer of traditional AI is combined with the easy inductive learning of traditional NNs. Of course, the trick is to couch this in a connectionist framework to continue to allow for fast parallel computation. -Thomas Edwards