Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh From: markh@csd4.csd.uwm.edu (Mark William Hopkins) Newsgroups: comp.ai.neural-nets Subject: Where "Y(1 - Y)" in bp. comes from (was: Re: Dynamic range of nodes) Message-ID: <8571@uwm.edu> Date: 27 Dec 90 22:55:02 GMT References: <1990Dec21.010536.17034@aplcen.apl.jhu.edu> <8513@uwm.edu> <1990Dec22.042610.23800@aplcen.apl.jhu.edu> Sender: news@uwm.edu Organization: University of Wisconsin - Milwaukee Lines: 28 Bob, concerning your backprop. question: When you calculate the weight adjustments, you're taking a certain delta value and multiplying it by the derivative of an activation function. Your activation function is y = tanh(n) Its derivative is: dy/dn = sech**2(n) which you acknowledged. BUT, your program, and most neural net simulators will express this function in terms of y, not n. Generally, sech**2(n) = 1 - tanh**2(n), so when expressed in terms of y, it becomes: dy/dn = 1 - y**2. If you apply the same operation on the sigmoid activation function (that is, calculate f'(f**-1(y))) then you get y = 1/(1 + exp(-n)), dy/dn = y(1 - y) (which you used in the part of your program that applied this activation function).