Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!ucsd!ucbvax!bloom-beacon!eru!hagbard!sunic!dkuug!daimi!fodslett
From: fodslett@daimi.aau.dk (Martin Moller)
Newsgroups: comp.ai.neural-nets
Subject: Scaled Conjugate Gradient (SCG). Preprint soon available.
Keywords: supervised learning, feedforward neural networks.
Message-ID: <1990Nov17.130731.12392@daimi.aau.dk>
Date: 17 Nov 90 13:07:31 GMT
Sender: fodslett@daimi.aau.dk (Martin Moller)
Organization: DAIMI: Computer Science Department, Aarhus University, Denmark
Lines: 54

Recently there has been a lot of interest in faster learning algorithms than
back-propagation. 

Newtons method and conjugate gradient methods has been mentioned. 
Common for these algorithms is that they raise the calculation complexity 
per learning iteration considerable. Newtons method by inverting the Hessian 
matrix to the error function and the conjugate gradient methods by
performing a line search in order to determine a good step size.

For the past 1 1/2 year I have been working on developing a conjugate 
gradient method avoiding this line search. 
Instead of using a line search, I use a scaling technique. The algorithm has 
been in use now for more than 1/2 a year and works well. 
For each iteration it has to calculate the gradient to the error twice. 
For comparison standard conjugate gradient methods using line search 
use in average 4-20 calculations of the gradient. 
The algorithm is an order of magnitude faster than backpropagation and 
seems to be able to handle ravine phenomena much more effective than BP.

A preprint paper describing this algorithm in detail will soon be available 
(in 1 or 2 weeks) by ftp. Here follows a short abstract of the paper:

Abstract-- A supervised learning algorithm (Scaled Conjugate Gradient, SCG)
with superlinear convergence rate is introduced. SCG uses second order 
information from the neural network, but requires only O(N) memory usage.
The performance of SCG is benchmarked against the performance of the standard
backpropagation algorithm (BP) and several recently proposed standard conjugate
gradient algorithms. SCG yields a speed-up at least an order of magnitude 
relative to BP. The speed-up depends on the convergence criterion,i.e., the
bigger demand for reduction in error the bigger the speed-up. SCG is fully
automated including no user dependent parameters and avoids a time consuming 
line search, which other conjugate gradient algorithms use in order to 
determine a good step size.
Incorporating problem dependent structural information in the architecture of 
a neural network often lowers the overall complexity. The smaller the 
complexity of the neural network relative to the problem domain, the bigger the
possibility that the weights space contains long ravines characterized by sharp
curvature. While BP is inefficient on these ravine phenomena, SCG handles them 
efectively.


	Martin

NB. Any question or comments on this short writing or on the preprint following
shortly would be appriciated.

______________________________ 

Martin Fodslette Moller
Computer Science Dept.
University of Aarhus
Denmark
e-mail: fodslett@daimi.aau.dk
_______________________________