Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!greenba From: greenba@gambia.crd.ge.com (ben a green) Newsgroups: comp.ai.neural-nets Subject: Re: The first few epochs in BP Message-ID: Date: 3 Apr 91 18:16:08 GMT References: <6882@rex.cs.tulane.edu> Sender: news@crdgw1.crd.ge.com Organization: GE Corporate Research & Development Lines: 32 In-reply-to: georgiou@rex.cs.tulane.edu's message of 3 Apr 91 00:11:33 GMT In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes: For those who worked with Back-Propagation: Have you notice any chaotic behavior in the graph of the (usual) error function vs epochs? Specifically, during the first 2 of 3 epochs the value of the error would jump all over the place, but afterwords becomes smooth. Only once I saw this behavior in the literature. It was in a graph in a paper in a respected publication, but it was ignored. Is this symptomatic of gradient descent procedures? It is not characteristic of all gradient descent procedures, but it is characteristic of a common back-prop procedure of updating weights before collecting errors on the whole training set. And it is characteristic of the usual back-prop technique of using a constant learning rate. To get a numerically stable procedure, collect errors over the whole training set , compute the gradient direction of the error in weight space, and do a line search along that line to find a minimum. Ben -- Ben A. Green, Jr. greenba@crd.ge.com Speaking only for myself, of course.