Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!blake!james From: james@blake.acs.washington.edu (James Taylor) Newsgroups: comp.ai.neural-nets Subject: Re: Does back-propagation work with a wider dynamic range...? Keywords: Back-propagation, dynamic range.... Message-ID: <2166@blake.acs.washington.edu> Date: 25 May 89 20:22:52 GMT References: <18635@vax5.CIT.CORNELL.EDU> Reply-To: james@blake.acs.washington.edu (James Taylor) Organization: University of Washington, Seattle Lines: 64 In article <18635@vax5.CIT.CORNELL.EDU> writes: [...] > I would like to train a feedforward net with input-output patterns > that have a wider dynamic range. For example the outputs of the net > will vary , say, between -4.0 ,4.0 or for some cases you don't even > know the range, because NN will be a part of a dynamic system with > a certain degree of freedom. > > So what I did? I modified the activation function for the output > units and used f(x)=x, also made the necessary changes in the > error derivation where you need the derivative of f(x). Anyway > I got very bad results, huge numbers in the order of billions. > > Kemal Ciliz > olky@vax5.cit.cornell.edu > mkciliz@cmx.npac.syr.edu [...] I have done similiar things - scaling problem patterns for <0,1>, or <-n,n>. I ran into that sort of unstable training behavior in the magnitude of the weights when I screwed up the feedback for the network. The experiments I ran worked OK as long as I guarranteed that the feedback really did go to zero as the node activation went to +-n ie. if (ignoring subscripts, everything is at node i, approxomately following the notation of the PDP books) sig(y) = {2n/(1-exp[-y])} - n = n*(1+exp[-y])/(1-exp[-y) dsig(y)/dy = n*(-2exp[-y])/{(1-exp[-y])^2} = n*(1+sig(y))*(1-sig(y)) = n*(1+x)*(1-x) = sig'(y) Which implies that Delta_W = delta *alpha*x = -sig'(y)* SUM[of delta*W next layer] *alpha*x Now if n > 1 with the above definition of the sigmiod when | sig(y) | > 1 sig'(y) changes sign and the feedback Delta_W becomes a **postive** feedback. Bad. The simplest solution is to make n<=1 and scale the output of the network. Put a fixed gain at the output, scale your target output to [-1,+1] during training, then rescale during testing if you really want to see the larger magnitude outputs. I played a little with alternate sigmoid definitions but in every case I came out with a similiar problem, with a much more complex training algorithm. If you come up with an alternate solution I'd be very interested. James Taylor james@uw-isdl.ee.washington.edu