Path: utzoo!news-server.csri.toronto.edu!rutgers!ucsd!sdcc6!mangani!pluto
From: pluto@mangani.ucsd.edu (Mark Plutowski)
Newsgroups: comp.ai.neural-nets
Subject: Re: Are Conjugate Gradient algorithms any good?
Keywords: Conjugate Gradient algorithms, Back-propagation
Message-ID: <pluto.668716010@mangani>
Date: 11 Mar 91 18:26:50 GMT
References: <47034@nigel.ee.udel.edu>
Sender: news@sdcc6.ucsd.edu
Lines: 42

chester@udel.edu (Daniel Chester) writes:

>In his March 6th reply to Denis Anthony, Mark Plutowski made the assertion
>that "the backpropagation update is the special case of the Gauss-Newton
>update obtained by setting the Hessian to the identity matrix."  This is
>incorrect; to get something like the backpropagation update, the Hessian
>has to be set to the 0 matrix. Even then it is not the same because it
>does a line search where backpropagation does one step.  If you set the
>Hessian to the identity matrix, the Gauss-Newton update becomes a
>conjugate-gradient method.  See the following reference for details.

Thank you, I misspoke.
We are referring to different objects.  I believe you are 
referring to the matrix composed from second derivatives.
I was referring to the matrix that weights the gradient of the squared error  
that appears when deriving gradient descent for squared error via   
a first order Taylor expansion.

In my references (see below) this is referred to as the Gauss-Newton update, 
and the inverse of the outer product (summed over all training examples) of the 
gradient of the network function w.r.t. the weights is used as an approximation
to the Hessian.  (The expectation of this matrix as sample size grows large 
is the Hessian).  Unfortunately, it is sometimes mistakenly referred to as the 
so-called Hessian in some of the literature, as I did above; 
thank you for pointing this out.  

I hope that someday we will be able to post precise equations rather than verbal 
descriptions - this would alleviate this kind of confusion!

Thanks also for the references for obtaining CG from Gauss-Newton.


References:
1) Fedorov, Theory of Optimal Experiments, 1972, (p.35)
2) Seber & Wild, Nonlinear Regression, 1989, pp. 21-23.

-=-=
M.E. Plutowski,  pluto%cs@ucsd.edu 

UCSD,  Computer Science and Engineering 0114
9500 Gilman Drive
La Jolla, California 92093-0114