Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!greenba
From: greenba@gambia.crd.ge.com (ben a green)
Newsgroups: comp.ai.neural-nets
Subject: Re: The first few epochs in BP
Message-ID: <GREENBA.91Apr3131608@gambia.crd.ge.com>
Date: 3 Apr 91 18:16:08 GMT
References: <6882@rex.cs.tulane.edu>
Sender: news@crdgw1.crd.ge.com
Organization: GE Corporate Research & Development
Lines: 32
In-reply-to: georgiou@rex.cs.tulane.edu's message of 3 Apr 91 00:11:33 GMT

In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes:

   For those who worked with Back-Propagation: Have you notice any
   chaotic behavior in the graph of the (usual) error function vs epochs?
   Specifically, during the first 2 of 3 epochs the value of the error
   would jump all over the place, but afterwords becomes smooth.

   Only once I saw this behavior in the literature. It was in a graph in
   a paper in a respected publication, but it was ignored.

   Is this symptomatic of gradient descent procedures?

It is not characteristic of all gradient descent procedures, but it is
characteristic of a common back-prop procedure of updating weights
before collecting errors on the whole training set.

And it is characteristic of the usual back-prop technique of using a
constant learning rate.

To get a numerically stable procedure, collect errors over the whole
training set , compute the gradient direction of the error in weight
space, and do a line search along that line to find a minimum.

Ben

--
Ben A. Green, Jr.              
greenba@crd.ge.com
  Speaking only for myself, of course.