Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!simonof
From: simonof@aplcen.apl.jhu.edu (Simonoff Robert  301 540 1864)
Newsgroups: comp.ai.neural-nets
Subject: Dynamic range of nodes
Message-ID: <1990Dec21.010536.17034@aplcen.apl.jhu.edu>
Date: 21 Dec 90 01:05:36 GMT
Reply-To: simonof@aplcen.apl.edu (Simonoff Robert  301 540 1864)
Followup-To: Robert Simonoff
Organization: Johns Hopkins University
Lines: 181


Netters: A question on the dynamic range of nodes in a
backpropagation network.  The answer should be obvious, but
I can not for the life of me find the solution.  Below are
two code fragments from a backpropagation network I have
written.  The first fragment (above the dotted line) works
perfectly for neurons having a dynamic range of [0.0, 1.0].
I decided to rewrite the code so as to allow networks to
have a range of [-1.0, 1.0] (the second code fragment).  I
am under the impression that the activation function must be
changed as well as the computation of delta which uses the
derivative of the activation function.

I have choseen as my new activation function the hyperbolic
tangent function which is defined from [-1.0, 1.0].  The
derivative of this function is:
                2            1
tanh'(X) == sech (X)  ==  ---------
                              2
                          cosh (X)

If anyone can descern what is wrong with the second code
fragment, I would appreciate the help.  If I am forgetting
to make other changes (I have already made the
administrative changes such as input value range and output
value range) please notify me.

The symptom is that the weights connecting the input layer
to the hidden layer grow rapidly to large numbers (both
positive and negitive).  But the network never converges to
an answer, the weights just grow (never changing sign - if
they start positive, they grow to be larger positive
positive numbers).


The following code is taken from a BP program I have written
that works.  I can substitute this code for the code that
does not appear to work change the -1.0 inputs to 0.0
and outputs the same way and the code works fine.  But when
the code below the dotted line is used, the network never
converges.


/* w1[node1][node2] = weight from node2 in the input layer
                      to node1 in the hidden layer

   w2[node1][node2] = weight from node2 in the hidden layer
                      to node1 in the output layer

   input_vector[pattern][node] = input node output value
                                 for pattern

   out1[pattern][node] = hidden node output value for
                         pattern

   out2[pattern][node] = output node output value for
                         pattern

   target[pattern][node] = target output value for pattern

   delta1[pattern][node] = delta for hidden node,pattern

   delta2[pattern][node] = delta for output node,pattern
*/

int compute_outputs(int pattern, int player)
{
   int i,j;
   double netinput;


   for (j=1;j<=nh ;j++ )
      {
         netinput = w1[j][nip1];
         for (i=1;i<=ni ;i++ )
            netinput += w1[j][i] * input_vector[pattern][i];

         out1[pattern][j]=1.0/(1.0+exp(-netinput));
      } /* endfor */

   for (j=1;j<=no ;j++ )
      {
         netinput = w2[j][nhp1];
         for (i=1;i<=nh ;i++ )
            netinput += w2[j][i] * out1[pattern][i];

         out2[pattern][j]=1.0/(1.0+exp(-netinput));
      } /* endfor */
}


int compute_delta(int pattern, int winner)
{
   int i,m;
   double sum;

   for (i=1;i<=no ;i++ )
      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
                             out2[pattern][i]*(1.0-out2[pattern][i]);

   for (i=1;i<=nh ;i++ ) {
      sum=0.0;

      for (m=1;m<=no ;m++ )
         sum += delta2[pattern][m] * w2[m][i];

      delta1[pattern][i] = sum * out1[pattern][i]*(1.0-out1[pattern][i]);
   } /* endfor */
}


-----------------------------------------------------------

The following are the routines that I believe should change
as a result of the new dynamic range for the neurons [-1.0, 1.0].
There are also administrative changes that include the input
values and output values.

int compute_outputs(int pattern)
   {
      int j,i;

      for (j=1;j<=nh ;j++ )
         {
            netinput = w1[j][nip1];
            for (i=1;i<=ni ;i++ )
               netinput += w1[j][i] * input_vector[pattern][i];

            out1[pattern][j]=tanh(netinput);
         } /* endfor */

      for (j=1;j<=no ;j++ )
         {
            netinput = w2[j][nhp1];
            for (i=1;i<=nh ;i++ )
               netinput += w2[j][i] * out1[pattern][i];

            out2[pattern][j]=tanh(netinput);
         } /* endfor */
   }


int compute_delta(int pattern)
{
   int i,m;
   double sum;


   for (i=1;i<=no ;i++ )
      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
                             1.0/(cosh(out2[pattern][i])*
                                  cosh(out2[pattern][i]));

   for (i=1;i<=nh ;i++ ) {
      sum=0.0;

      for (m=1;m<=no ;m++ )
         sum += delta2[pattern][m] * w2[m][i];

      delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])*
                              cosh(out1[pattern][i]));
  } /* endfor */
}


------------------------------------------------------

Thanks.
Bob Simonoff
simonof@aplcen.apl.edu

-- 
***********************************************************
Bob Simonoff
simonof@aplcen
Johns Hopkins University