Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!blake!james
From: james@blake.acs.washington.edu (James Taylor)
Newsgroups: comp.ai.neural-nets
Subject: Re: Does back-propagation work with a wider dynamic range...?
Keywords: Back-propagation, dynamic range....
Message-ID: <2166@blake.acs.washington.edu>
Date: 25 May 89 20:22:52 GMT
References: <18635@vax5.CIT.CORNELL.EDU>
Reply-To: james@blake.acs.washington.edu (James Taylor)
Organization: University of Washington, Seattle
Lines: 64


In article <18635@vax5.CIT.CORNELL.EDU> writes:
[...]
>	I would like to train a feedforward net with input-output patterns
>	that have a wider dynamic range. For example the outputs of the net
>	will vary , say, between -4.0 ,4.0 or for some cases you don't even
>	know the range, because NN will be a part of a dynamic system with
>	a certain degree of freedom. 
>
>	So what I did? I modified the activation function for the output
>	units and used f(x)=x, also made the necessary changes in the 
>	error derivation where you need the derivative of f(x). Anyway
>	I got very bad results, huge numbers in the order of billions.
>
>							Kemal Ciliz
>						olky@vax5.cit.cornell.edu
>						mkciliz@cmx.npac.syr.edu
[...]

I have done similiar things - scaling problem patterns for <0,1>, or
<-n,n>.  I ran into that sort of unstable training behavior in the
magnitude of the weights when I screwed up the feedback for the network.
The experiments I ran worked OK as long as I guarranteed that the
feedback really did go to zero as the node activation went to +-n ie.

if (ignoring subscripts, everything is at node i, approxomately
    following the notation of the PDP books)

sig(y) = {2n/(1-exp[-y])} - n
       = n*(1+exp[-y])/(1-exp[-y)

dsig(y)/dy = n*(-2exp[-y])/{(1-exp[-y])^2}
	   = n*(1+sig(y))*(1-sig(y))
	   = n*(1+x)*(1-x)
           = sig'(y)

Which implies that 

Delta_W = delta *alpha*x
        = -sig'(y)* SUM[of delta*W next layer] *alpha*x


Now if n > 1  with the above definition of the sigmiod

when
	| sig(y) | > 1

sig'(y) changes sign and the feedback Delta_W becomes a **postive**
feedback.  Bad.

The simplest solution is to make n<=1 and scale the output of the 
network.  Put a fixed gain at the output, scale your target output
to [-1,+1] during training, then rescale during testing if you really
want to see the larger magnitude outputs.

I played a little with alternate sigmoid definitions but in every
case I came out with a similiar problem, with a much more complex
training algorithm.

If you come up with an alternate solution I'd be very interested.

James Taylor

james@uw-isdl.ee.washington.edu