Path: utzoo!attcan!uunet!snorkelwacker!think!nrl-cmf!tedwards
From: tedwards@nrl-cmf.UUCP (Thomas Edwards)
Newsgroups: comp.ai.neural-nets
Subject: Re: What good are neural nets?
Summary: Watch for progress
Message-ID: <79@nrl-cmf.UUCP>
Date: 22 Mar 90 16:54:21 GMT
References: <68764@aerospace.AERO.ORG> <TED.90Mar21113353@kythera.nmsu.edu> <2355@rnd.GBA.NYU.EDU> <TED.90Mar21175729@kythera.nmsu.edu> <14746@phoenix.Princeton.EDU> <TED.90Mar22085433@kythera.nmsu.edu>
Reply-To: tedwards@cmsun.UUCP (Thomas Edwards)
Organization: NRL Connection Machine Facility, Washington, DC
Lines: 40

In article <TED.90Mar22085433@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:
>unfortunately, in most of the networks exhibited so far, the scaling
>of the size of the neural net or the accuracy required is prohibitive.

I'll be the first one to admit that backpropagation learning can be truly tedious, and
using it on anything but the most toy problems will definately leave one with a
bad taste in the mouth for neural networks.

However, researchers are realizing that there are major problems with backpropagation

1) fixed step size--vanilla backprop does not include much in the way of higher order
                    derivatives to work out how far it should step along the error
                    surface each iteration.  Momentum, though useful, required much
                    tweaking for a problem.  Methods which involve higher order
                    derivatives (such as Quickprop (Fahlman, 1988) or conjugate
                    gradient methods) provide up to an order of magnitude decrease in
                    learning time.
2) moving targets---if to solve a problem, the network must evolve into two groups of
                    neurons solving inter-related problems, if one group of neurons
                    change significantly, the other set of neurons must change to
                    continue to "work" effectively with the first group. Also if
                    the network as a whole works to solve one subproblem of the problem,
                    it might then forget how to solve the first subproblem when it begins
                    to solve a second subproblem.

  Anyway, backprop is not the only model available to researchers.  I encourage programmers
to look at conjugate-gradient, quickprop, and cascade-correlation.  Cascade-correlation
(Fahlman, 1990) has solved the two intertwined spiral problem in 1700 training epochs
(which are faster than backprop epochs), compared to 20,000 backprop epochs with a
2-5-5-5-1 network (with "short-cuts") (Lang, 1988).

 Fahlman, S.E. (1988) "Faster-Learning Variations on Back-Propagation: An Empirical
                       Study" in _Proceedings_of_the_1988_Connectionists_Models_Summer_
                       School_, Morgan Kaufmann.
 Fahlman, S.E., and Lebiere, C. (1990) "The Cascade-Correlation Learning Architecture"
                       Carnegie Mellon.
 Lang, K., and Witbrok, M. (1988) "Learning to Tell Two Spirals Apart" in _Proceedings_of_
                       the_1988_Connectionists_Models_Summer_School_, Morgan Kaufmann.

-Thomas Edwards