Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!mbkennel
From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Newsgroups: comp.ai.neural-nets
Subject: Re: Training
Message-ID: <7378@phoenix.Princeton.EDU>
Date: 25 Mar 89 02:19:57 GMT
References: <2698@sun.soe.clarkson.edu> <2351@buengc.BU.EDU> <1577@vicom.COM> <7326@phoenix.Princeton.EDU> <774@cb.ecn.purdue.edu>
Reply-To: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Organization: Princeton University, NJ
Lines: 26

In article <774@cb.ecn.purdue.edu> kavuri@cb.ecn.purdue.edu (Surya N Kavuri ) writes:
>
> Besides gradient and conjugate gradient methods there are other one could try.
> There are methods known as Quasi-Newton methods that are known
> to perform much better in non-linear optimization.  It is 
> because they use higher order derivatives besides the first(as  in gradient methods), and thus use more knowledge of the 
> objective function contours.  Second and higher order derivatives can be thought of as indicators of error (surfaces) arising from gradient application.  This additional knowledge is used to achieve a much rapid convergence.  
> The computational difficulties with quasi-Newtonian approaches is the evaluation of the Hessian.
> There are inexpensive updating methods as BFGS(Broyden-Fletcher -Goldfarb-Shannon) algorithm.
>      (most NLP books should have this)
>                                           Surya


I don't think you want to use Quasi-Newton methods such as BFGS for
most neural-net problems.  They require O(N^2) storage where N is the
number of _weights_, and in each iteration require at least
O(N^2) operations.  If you're dealing with small N, this is usually
insignificant next to the time to evaluate your functions, but for
large N, it might be a problem.  

I use conjugate-gradient, which doesn't have this problem but probably
requires more function evaluations than BFGS by a moderate, but not
huge factor.

Matt K.
mbkennel@phoenix.princeton.edu