Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!caen!uwm.edu!csd4.csd.uwm.edu!markh
From: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Newsgroups: comp.ai.neural-nets
Subject: Re: Backpropagation with Newton's Method, and recurrence.  Source code.
Message-ID: <7684@uwm.edu>
Date: 16 Nov 90 03:21:06 GMT
References: <7607@uwm.edu> <1248@helens.Stanford.EDU>
Sender: news@uwm.edu
Organization: University of Wisconsin - Milwaukee
Lines: 41

In article <1248@helens.Stanford.EDU> wan@isl.Stanford.EDU (Eric A. Wan) writes:
>You mention the method does not work well for f(x) = x^2.  In fact, Newton's
>Method applied to the gradient of x^2 converges in one step.

You lost me here.  Try using Newton's method to find a zero of x^2.  This is
what I was describing.  It has the same (relatively poor) performance as
binary search:

		  x[n+1] = x[n] -  x[n]^2/(2*x[n]) = 1/2 * x[n].

>Using Newton's method to find a zero of the mse surface does not seem like a
>very wise thing to do.  The mse surface almost never has a zero in a real
>problem.  The goal of an optimization problem like training a neural network
>is to minimize an objective function, not to find a zero in that function
>(especially if one does not exist).

I simply disagree.

You have to concede: it's a valid point to say that the goal is to find a zero
or near-zero of the error function.  It's no use finding any minima if they
aren't 'near' zero.  And if there aren't any minima AT ALL near a zero on the
surface, then how could it even make sense to talk about convergence in the
first place?  A "solution" with a significant error is simply not a solution,
even if it's "optimum".

It goes the other way around too: if you're already near a zero, then going
further to find a minima is just plain counter-productive.  You know the
saying: "if it ain't broke, don't fix it".  All you end up doing is forcing the
net to learn something that, in the judgement of the people who defined "near"
for that particular application, it ALREADY 'knows'.

(On Newton's Method never fully converging):
>On the other hand, it will also never reach a stable solution unless you
>impose arbitrary halting rules.

I pointed this out, but argued that it was a positive feature to have, to
have the neural net autonomously determine when it learns and when it doesn't.
The particular halting rule I described was not arbitrary but followed directly
from the considerations above about when you can say something is "solved" and
when it's not "solved".  You just provide a cutoff on the error function
itself.