Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!agate!shelby!helens!isl!wan From: wan@isl.Stanford.EDU (Eric A. Wan) Newsgroups: comp.ai.neural-nets Subject: Re: Backpropagation with Newton's Method, and recurrence. Source code. Message-ID: <1248@helens.Stanford.EDU> Date: 14 Nov 90 01:56:01 GMT References: <7607@uwm.edu> Sender: news@helens.Stanford.EDU Organization: Stanford University Lines: 33 We find your recent postings concerning Newton's Method for neural networks somewhat misleading. Using Newton's method to find a zero of the mse surface does not seem like a very wise thing to do. The mse surface almost never has a zero in a real problem. The goal of an optimization problem like training a neural network is to minimize an objective function, not to find a zero in that function (especially if one does not exist). You do not want to use Newton's method to find a zero of the error surface but rather of the gradient. Some of the difficulties you encountered (i.e. infinite jumps near a local minimum) are a result of this error. You mention the method does not work well for f(x) = x^2. In fact, Newton's Method applied to the gradient of x^2 converges in one step. Furthermore, you are considering terms independently. Only diagonal terms of the "Hessian" (not actually a Hessian in your algorithm) are being approximated. This is not Newton's method. True Newton's method requires the full Hessian. Using the diagonal performs no rotation and is more akin to improving eigenvalue spread. Le Cun and Becker (Proceedings of the Connectionist Models Summer School, 1988) derived a method for finding the exact diagonal values. However, their method is only valid for networks with a single hidden layer. Note: Newton's Methods (applied to the gradient) as well as other second order methods (Conjugate Gradient, Quasi-Newton, etc.) have been studied for use with neural networks. In all cases the algorithms are subject to local minima. It is true your method is not subject to local minima. On the other hand, it will also never reach a stable solution unless you impose arbitrary halting rules. - M. Lehr, E. Wan, F. Beaufays, S. Piche