Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh
From: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Newsgroups: comp.ai.neural-nets
Subject: Where "Y(1 - Y)" in bp. comes from (was: Re: Dynamic range of nodes)
Message-ID: <8571@uwm.edu>
Date: 27 Dec 90 22:55:02 GMT
References: <1990Dec21.010536.17034@aplcen.apl.jhu.edu> <8513@uwm.edu> <1990Dec22.042610.23800@aplcen.apl.jhu.edu>
Sender: news@uwm.edu
Organization: University of Wisconsin - Milwaukee
Lines: 28

Bob, concerning your backprop. question:

   When you calculate the weight adjustments, you're taking a certain delta
value and multiplying it by the derivative of an activation function.  Your
activation function is

		             y = tanh(n)

Its derivative is:
			   dy/dn = sech**2(n)

which you acknowledged.  BUT, your program, and most neural net simulators
will express this function in terms of y, not n.

   Generally, sech**2(n) = 1 - tanh**2(n), so when expressed in terms of y,
it becomes:

			 dy/dn = 1 - y**2.


If you apply the same operation on the sigmoid activation function (that is,
calculate f'(f**-1(y))) then you get

			   y = 1/(1 + exp(-n)),
			   dy/dn = y(1 - y)

(which you used in the part of your program that applied this activation
function).