Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!greenba From: greenba@gambia.crd.ge.com (ben a green) Newsgroups: comp.ai.neural-nets Subject: Re: State of the Art Feed-Forward Network Training Algorithms Message-ID: Date: 17 May 91 14:05:57 GMT References: <1991May17.090435.9180@fwi.uva.nl> Sender: news@crdgw1.crd.ge.com Organization: GE Corporate Research & Development Lines: 79 In-reply-to: smagt@fwi.uva.nl's message of 17 May 91 09:04:35 GMT In article <1991May17.090435.9180@fwi.uva.nl> smagt@fwi.uva.nl (Patrick van der Smagt) writes: aj3u@opal.cs.virginia.edu (Asim Jalis) writes: >What is the state of the art in training feed-forward networks. ... >did not see anything that improved performance over plain >Backpropagation drastically for the general case. The answer to this question is really very, very simple. Remember that teaching a feed-forward network is nothing but minimisation of a function E = 1/2 \sum_p (\vec d_p - \vec a)^2 by varying the parameters W. This has been investigated a zillion of times. Look up any book on numerical analysis, e.g., %A W. H. Press %A B. P. Flannery %A S. A. Teukolsky %A W. T. Vetterling %T Numerical Recipes: The Art of Scientific Computing %I Cambridge University Press %C Cambridge %D 1986 %A J. Stoer %A R. Bulirsch %T Introduction to Numerical Analysis %I Springer-Verlag %C New York--Hei\-del\-berg--Ber\-lin %D 1980 to read about improvements to steepest descent (aka gradient descent) minimisation. Amongst the best methods is conjugate gradient optimisation. I myself haven't used error back-propagation for over a year, but CG instead. It sizzles. More needs to be said about conjugate gradient optimisation. I agree fully with Patrick van der Smagt, but the word is about that CG is no good. There was a thread on this newsgroup recently "Are CG algorithms any good?" Through the courtesy of a Government Agency, whose permission I do not have to name, I recently ran a test training a net on some representations of speech sounds. They had found that backprop trained to 90% accuracy in about 150,000 presentations of the training set, while CG could not get past 70%. They were using a CG implementation that shall be nameless, but which is made available to the public by a university. My implementation of CG trained to 90% on this problem in 1676 presentations of the training set. That's a factor of 89 faster than backprop. I have no idea why the other implementation failed so badly. There are many choices to make concerning how to do linesearches, for example. This is not the only example: Another hit on CG was made in the thesis of a student of a well-known NN researcher in the Northeast. He said that CG got stuck in local minima. He was kind enough to share the data with us, and we trained on the problem very quickly with CG and with no local minimum problem. For an introduction to CG applied to NN training, see the excellent article by Kramer and Sangiovanni in Adv. in Neural Information Processing I, pp. 40-48, 1989. Buy the book from Morgan Kaufman, San Mateo, CA, USA, if you have to. Please do not ask for my software. GE won't let me give it out. Ben -- Ben A. Green, Jr. greenba@crd.ge.com Speaking only for myself, of course.