Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!nsc!taux01!cyusta From: cyusta@taux01.UUCP ( Yuval Shahar) Newsgroups: comp.ai.neural-nets Subject: Re: back propagation problems.... Summary: Does it really work?? Message-ID: <2846@taux01.UUCP> Date: 12 Nov 89 08:22:53 GMT References: <89312.121049LAL102@PSUVM.BITNET> Reply-To: cyusta@nsc.nsc.com ( Yuval Shachar ) Organization: National Semiconductor (IC) Ltd, Israel Lines: 27 >-------------------------------------------------- Lik Alaric Lau writes: > I have written a neural network simulatio >n program in Pascal. This simulation uses back propagation as training algorith >m. However, when the network is trained to recognize more than 1 training pairs >, it tends to "forget" the previous training sets. This is something I am experiencing myself now. The problem seems to me to arise from the nature of the gradient descent: the error terms are calculated according to the gradient of the error function Ep(W), where W is the set of weights, and Ep is the error for the presentation of the p'th exemplar to be learned. The total error E for a set of P exemplars is therefore the sum(Ep) for p=1..P. In order to perform a gradient descent in E it is clear you may not update the weights after each presentation. If you do you actually perform a gradient descent for each Ep, and thus each exemplar will be learned but forgotten as you perform the corrections for the next exemplar. PDP (chapter 8 I think) have commented that the weights may be changed after each presentation if the learning factor, Miu, is small enough, as this will be "close enough" to a gradient descent in E. The results I'm seeing are disappointing to me. I have tried updating the weights after each presentation, after presenting a set of exemplars, and even after each exemplar but with a delta-rule which is updated so that it still performs a true gradient descent in E. The net does learn a set of exemplars sometimes, but more often then not, it converges to a really bad local minimum for a set (the equivalent of learning and forgetting), and I am not talking about big sets here (actually, a set with more than one exemplar is enough :-)). Is this the true nature of backprop or is there more to this??