Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!nsc!taux01!cyusta
From: cyusta@taux01.UUCP ( Yuval Shahar)
Newsgroups: comp.ai.neural-nets
Subject: Re: back propagation problems....
Summary: Does it really work??
Message-ID: <2846@taux01.UUCP>
Date: 12 Nov 89 08:22:53 GMT
References: <89312.121049LAL102@PSUVM.BITNET>
Reply-To: cyusta@nsc.nsc.com ( Yuval Shachar )
Organization: National Semiconductor (IC) Ltd, Israel
Lines: 27


>-------------------------------------------------- Lik Alaric Lau writes:
>			              I have written a neural network simulatio
>n program in Pascal. This simulation uses back propagation as training algorith
>m. However, when the network is trained to recognize more than 1 training pairs
>, it tends to "forget" the previous training sets.

   This is something I am experiencing myself now. The problem seems to me to
arise from the nature of the gradient descent: the error terms are calculated
according to the gradient of the error function Ep(W), where W is the set of
weights, and Ep is the error for the presentation of the p'th exemplar to be
learned. The total error E for a set of P exemplars is therefore the sum(Ep) 
for p=1..P. In order to perform a gradient descent in E it is clear you may
not update the weights after each presentation. If you do you actually perform
a gradient descent for each Ep, and thus each exemplar will be learned but
forgotten as you perform the corrections for the next exemplar.
   PDP (chapter 8 I think) have commented that the weights may be changed
after each presentation if the learning factor, Miu, is small enough, as this
will be "close enough" to a gradient descent in E.
   The results I'm seeing are disappointing to me. I have tried updating the
weights after each presentation, after presenting a set of exemplars, and even
after each exemplar but with a delta-rule which is updated so that it still
performs a true gradient descent in E. The net does learn a set of exemplars
sometimes, but more often then not, it converges to a really bad local minimum
for a set (the equivalent of learning and forgetting), and I am not talking
about big sets here (actually, a set with more than one exemplar is enough :-)).
   Is this the true nature of backprop or is there more to this??