Path: utzoo!attcan!uunet!snorkelwacker!think!nrl-cmf!tedwards From: tedwards@nrl-cmf.UUCP (Thomas Edwards) Newsgroups: comp.ai.neural-nets Subject: Re: What good are neural nets? Summary: Watch for progress Message-ID: <79@nrl-cmf.UUCP> Date: 22 Mar 90 16:54:21 GMT References: <68764@aerospace.AERO.ORG> <2355@rnd.GBA.NYU.EDU> <14746@phoenix.Princeton.EDU> Reply-To: tedwards@cmsun.UUCP (Thomas Edwards) Organization: NRL Connection Machine Facility, Washington, DC Lines: 40 In article ted@nmsu.edu (Ted Dunning) writes: >unfortunately, in most of the networks exhibited so far, the scaling >of the size of the neural net or the accuracy required is prohibitive. I'll be the first one to admit that backpropagation learning can be truly tedious, and using it on anything but the most toy problems will definately leave one with a bad taste in the mouth for neural networks. However, researchers are realizing that there are major problems with backpropagation 1) fixed step size--vanilla backprop does not include much in the way of higher order derivatives to work out how far it should step along the error surface each iteration. Momentum, though useful, required much tweaking for a problem. Methods which involve higher order derivatives (such as Quickprop (Fahlman, 1988) or conjugate gradient methods) provide up to an order of magnitude decrease in learning time. 2) moving targets---if to solve a problem, the network must evolve into two groups of neurons solving inter-related problems, if one group of neurons change significantly, the other set of neurons must change to continue to "work" effectively with the first group. Also if the network as a whole works to solve one subproblem of the problem, it might then forget how to solve the first subproblem when it begins to solve a second subproblem. Anyway, backprop is not the only model available to researchers. I encourage programmers to look at conjugate-gradient, quickprop, and cascade-correlation. Cascade-correlation (Fahlman, 1990) has solved the two intertwined spiral problem in 1700 training epochs (which are faster than backprop epochs), compared to 20,000 backprop epochs with a 2-5-5-5-1 network (with "short-cuts") (Lang, 1988). Fahlman, S.E. (1988) "Faster-Learning Variations on Back-Propagation: An Empirical Study" in _Proceedings_of_the_1988_Connectionists_Models_Summer_ School_, Morgan Kaufmann. Fahlman, S.E., and Lebiere, C. (1990) "The Cascade-Correlation Learning Architecture" Carnegie Mellon. Lang, K., and Witbrok, M. (1988) "Learning to Tell Two Spirals Apart" in _Proceedings_of_ the_1988_Connectionists_Models_Summer_School_, Morgan Kaufmann. -Thomas Edwards