Path: utzoo!news-server.csri.toronto.edu!rutgers!ucsd!sdcc6!mangani!pluto From: pluto@mangani.ucsd.edu (Mark Plutowski) Newsgroups: comp.ai.neural-nets Subject: Re: Are Conjugate Gradient algorithms any good? Keywords: Conjugate Gradient algorithms, Back-propagation Message-ID: Date: 11 Mar 91 18:26:50 GMT References: <47034@nigel.ee.udel.edu> Sender: news@sdcc6.ucsd.edu Lines: 42 chester@udel.edu (Daniel Chester) writes: >In his March 6th reply to Denis Anthony, Mark Plutowski made the assertion >that "the backpropagation update is the special case of the Gauss-Newton >update obtained by setting the Hessian to the identity matrix." This is >incorrect; to get something like the backpropagation update, the Hessian >has to be set to the 0 matrix. Even then it is not the same because it >does a line search where backpropagation does one step. If you set the >Hessian to the identity matrix, the Gauss-Newton update becomes a >conjugate-gradient method. See the following reference for details. Thank you, I misspoke. We are referring to different objects. I believe you are referring to the matrix composed from second derivatives. I was referring to the matrix that weights the gradient of the squared error that appears when deriving gradient descent for squared error via a first order Taylor expansion. In my references (see below) this is referred to as the Gauss-Newton update, and the inverse of the outer product (summed over all training examples) of the gradient of the network function w.r.t. the weights is used as an approximation to the Hessian. (The expectation of this matrix as sample size grows large is the Hessian). Unfortunately, it is sometimes mistakenly referred to as the so-called Hessian in some of the literature, as I did above; thank you for pointing this out. I hope that someday we will be able to post precise equations rather than verbal descriptions - this would alleviate this kind of confusion! Thanks also for the references for obtaining CG from Gauss-Newton. References: 1) Fedorov, Theory of Optimal Experiments, 1972, (p.35) 2) Seber & Wild, Nonlinear Regression, 1989, pp. 21-23. -=-= M.E. Plutowski, pluto%cs@ucsd.edu UCSD, Computer Science and Engineering 0114 9500 Gilman Drive La Jolla, California 92093-0114