Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!mbkennel From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) Newsgroups: comp.ai.neural-nets Subject: Re: Training Message-ID: <7378@phoenix.Princeton.EDU> Date: 25 Mar 89 02:19:57 GMT References: <2698@sun.soe.clarkson.edu> <2351@buengc.BU.EDU> <1577@vicom.COM> <7326@phoenix.Princeton.EDU> <774@cb.ecn.purdue.edu> Reply-To: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) Organization: Princeton University, NJ Lines: 26 In article <774@cb.ecn.purdue.edu> kavuri@cb.ecn.purdue.edu (Surya N Kavuri ) writes: > > Besides gradient and conjugate gradient methods there are other one could try. > There are methods known as Quasi-Newton methods that are known > to perform much better in non-linear optimization. It is > because they use higher order derivatives besides the first(as in gradient methods), and thus use more knowledge of the > objective function contours. Second and higher order derivatives can be thought of as indicators of error (surfaces) arising from gradient application. This additional knowledge is used to achieve a much rapid convergence. > The computational difficulties with quasi-Newtonian approaches is the evaluation of the Hessian. > There are inexpensive updating methods as BFGS(Broyden-Fletcher -Goldfarb-Shannon) algorithm. > (most NLP books should have this) > Surya I don't think you want to use Quasi-Newton methods such as BFGS for most neural-net problems. They require O(N^2) storage where N is the number of _weights_, and in each iteration require at least O(N^2) operations. If you're dealing with small N, this is usually insignificant next to the time to evaluate your functions, but for large N, it might be a problem. I use conjugate-gradient, which doesn't have this problem but probably requires more function evaluations than BFGS by a moderate, but not huge factor. Matt K. mbkennel@phoenix.princeton.edu