Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!caen!uwm.edu!csd4.csd.uwm.edu!markh From: markh@csd4.csd.uwm.edu (Mark William Hopkins) Newsgroups: comp.ai.neural-nets Subject: Re: Backpropagation with Newton's Method, and recurrence. Source code. Message-ID: <7684@uwm.edu> Date: 16 Nov 90 03:21:06 GMT References: <7607@uwm.edu> <1248@helens.Stanford.EDU> Sender: news@uwm.edu Organization: University of Wisconsin - Milwaukee Lines: 41 In article <1248@helens.Stanford.EDU> wan@isl.Stanford.EDU (Eric A. Wan) writes: >You mention the method does not work well for f(x) = x^2. In fact, Newton's >Method applied to the gradient of x^2 converges in one step. You lost me here. Try using Newton's method to find a zero of x^2. This is what I was describing. It has the same (relatively poor) performance as binary search: x[n+1] = x[n] - x[n]^2/(2*x[n]) = 1/2 * x[n]. >Using Newton's method to find a zero of the mse surface does not seem like a >very wise thing to do. The mse surface almost never has a zero in a real >problem. The goal of an optimization problem like training a neural network >is to minimize an objective function, not to find a zero in that function >(especially if one does not exist). I simply disagree. You have to concede: it's a valid point to say that the goal is to find a zero or near-zero of the error function. It's no use finding any minima if they aren't 'near' zero. And if there aren't any minima AT ALL near a zero on the surface, then how could it even make sense to talk about convergence in the first place? A "solution" with a significant error is simply not a solution, even if it's "optimum". It goes the other way around too: if you're already near a zero, then going further to find a minima is just plain counter-productive. You know the saying: "if it ain't broke, don't fix it". All you end up doing is forcing the net to learn something that, in the judgement of the people who defined "near" for that particular application, it ALREADY 'knows'. (On Newton's Method never fully converging): >On the other hand, it will also never reach a stable solution unless you >impose arbitrary halting rules. I pointed this out, but argued that it was a positive feature to have, to have the neural net autonomously determine when it learns and when it doesn't. The particular halting rule I described was not arbitrary but followed directly from the considerations above about when you can say something is "solved" and when it's not "solved". You just provide a cutoff on the error function itself.