Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!agate!shelby!helens!isl!wan
From: wan@isl.Stanford.EDU (Eric A. Wan)
Newsgroups: comp.ai.neural-nets
Subject: Re: Backpropagation with Newton's Method, and recurrence.  Source code.
Message-ID: <1248@helens.Stanford.EDU>
Date: 14 Nov 90 01:56:01 GMT
References: <7607@uwm.edu>
Sender: news@helens.Stanford.EDU
Organization: Stanford University
Lines: 33


We find your recent postings concerning Newton's Method for neural
networks somewhat misleading.  Using Newton's method to find a zero of
the mse surface does not seem like a very wise thing to do.  The mse
surface almost never has a zero in a real problem.  The goal of an
optimization problem like training a neural network is to minimize an
objective function, not to find a zero in that function (especially if
one does not exist). You do not want to use Newton's method to find a
zero of the error surface but rather of the gradient.  Some of the
difficulties you encountered (i.e.  infinite jumps near a local
minimum) are a result of this error. You mention the method does not
work well for f(x) = x^2.  In fact, Newton's Method applied to the
gradient of x^2 converges in one step.

Furthermore, you are considering terms independently.  Only diagonal
terms of the "Hessian" (not actually a Hessian in your algorithm) are
being approximated.  This is not Newton's method.  True Newton's
method requires the full Hessian. Using the diagonal performs no
rotation and is more akin to improving eigenvalue spread.  Le Cun and
Becker (Proceedings of the Connectionist Models Summer School, 1988)
derived a method for finding the exact diagonal values.  However,
their method is only valid for networks with a single hidden layer.

Note: Newton's Methods (applied to the gradient) as well as other
second order methods (Conjugate Gradient, Quasi-Newton, etc.) have
been studied for use with neural networks.  In all cases the
algorithms are subject to local minima.  It is true your method is not
subject to local minima.  On the other hand, it will also never
reach a stable solution unless you impose arbitrary halting rules.


- M. Lehr, E. Wan, F. Beaufays, S. Piche