Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!simonof From: simonof@aplcen.apl.jhu.edu (Simonoff Robert 301 540 1864) Newsgroups: comp.ai.neural-nets Subject: Dynamic range of nodes Message-ID: <1990Dec21.010536.17034@aplcen.apl.jhu.edu> Date: 21 Dec 90 01:05:36 GMT Reply-To: simonof@aplcen.apl.edu (Simonoff Robert 301 540 1864) Followup-To: Robert Simonoff Organization: Johns Hopkins University Lines: 181 Netters: A question on the dynamic range of nodes in a backpropagation network. The answer should be obvious, but I can not for the life of me find the solution. Below are two code fragments from a backpropagation network I have written. The first fragment (above the dotted line) works perfectly for neurons having a dynamic range of [0.0, 1.0]. I decided to rewrite the code so as to allow networks to have a range of [-1.0, 1.0] (the second code fragment). I am under the impression that the activation function must be changed as well as the computation of delta which uses the derivative of the activation function. I have choseen as my new activation function the hyperbolic tangent function which is defined from [-1.0, 1.0]. The derivative of this function is: 2 1 tanh'(X) == sech (X) == --------- 2 cosh (X) If anyone can descern what is wrong with the second code fragment, I would appreciate the help. If I am forgetting to make other changes (I have already made the administrative changes such as input value range and output value range) please notify me. The symptom is that the weights connecting the input layer to the hidden layer grow rapidly to large numbers (both positive and negitive). But the network never converges to an answer, the weights just grow (never changing sign - if they start positive, they grow to be larger positive positive numbers). The following code is taken from a BP program I have written that works. I can substitute this code for the code that does not appear to work change the -1.0 inputs to 0.0 and outputs the same way and the code works fine. But when the code below the dotted line is used, the network never converges. /* w1[node1][node2] = weight from node2 in the input layer to node1 in the hidden layer w2[node1][node2] = weight from node2 in the hidden layer to node1 in the output layer input_vector[pattern][node] = input node output value for pattern out1[pattern][node] = hidden node output value for pattern out2[pattern][node] = output node output value for pattern target[pattern][node] = target output value for pattern delta1[pattern][node] = delta for hidden node,pattern delta2[pattern][node] = delta for output node,pattern */ int compute_outputs(int pattern, int player) { int i,j; double netinput; for (j=1;j<=nh ;j++ ) { netinput = w1[j][nip1]; for (i=1;i<=ni ;i++ ) netinput += w1[j][i] * input_vector[pattern][i]; out1[pattern][j]=1.0/(1.0+exp(-netinput)); } /* endfor */ for (j=1;j<=no ;j++ ) { netinput = w2[j][nhp1]; for (i=1;i<=nh ;i++ ) netinput += w2[j][i] * out1[pattern][i]; out2[pattern][j]=1.0/(1.0+exp(-netinput)); } /* endfor */ } int compute_delta(int pattern, int winner) { int i,m; double sum; for (i=1;i<=no ;i++ ) delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * out2[pattern][i]*(1.0-out2[pattern][i]); for (i=1;i<=nh ;i++ ) { sum=0.0; for (m=1;m<=no ;m++ ) sum += delta2[pattern][m] * w2[m][i]; delta1[pattern][i] = sum * out1[pattern][i]*(1.0-out1[pattern][i]); } /* endfor */ } ----------------------------------------------------------- The following are the routines that I believe should change as a result of the new dynamic range for the neurons [-1.0, 1.0]. There are also administrative changes that include the input values and output values. int compute_outputs(int pattern) { int j,i; for (j=1;j<=nh ;j++ ) { netinput = w1[j][nip1]; for (i=1;i<=ni ;i++ ) netinput += w1[j][i] * input_vector[pattern][i]; out1[pattern][j]=tanh(netinput); } /* endfor */ for (j=1;j<=no ;j++ ) { netinput = w2[j][nhp1]; for (i=1;i<=nh ;i++ ) netinput += w2[j][i] * out1[pattern][i]; out2[pattern][j]=tanh(netinput); } /* endfor */ } int compute_delta(int pattern) { int i,m; double sum; for (i=1;i<=no ;i++ ) delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * 1.0/(cosh(out2[pattern][i])* cosh(out2[pattern][i])); for (i=1;i<=nh ;i++ ) { sum=0.0; for (m=1;m<=no ;m++ ) sum += delta2[pattern][m] * w2[m][i]; delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])* cosh(out1[pattern][i])); } /* endfor */ } ------------------------------------------------------ Thanks. Bob Simonoff simonof@aplcen.apl.edu -- *********************************************************** Bob Simonoff simonof@aplcen Johns Hopkins University