Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!ucsd!ucbvax!bloom-beacon!eru!hagbard!sunic!dkuug!daimi!fodslett From: fodslett@daimi.aau.dk (Martin Moller) Newsgroups: comp.ai.neural-nets Subject: Scaled Conjugate Gradient (SCG). Preprint soon available. Keywords: supervised learning, feedforward neural networks. Message-ID: <1990Nov17.130731.12392@daimi.aau.dk> Date: 17 Nov 90 13:07:31 GMT Sender: fodslett@daimi.aau.dk (Martin Moller) Organization: DAIMI: Computer Science Department, Aarhus University, Denmark Lines: 54 Recently there has been a lot of interest in faster learning algorithms than back-propagation. Newtons method and conjugate gradient methods has been mentioned. Common for these algorithms is that they raise the calculation complexity per learning iteration considerable. Newtons method by inverting the Hessian matrix to the error function and the conjugate gradient methods by performing a line search in order to determine a good step size. For the past 1 1/2 year I have been working on developing a conjugate gradient method avoiding this line search. Instead of using a line search, I use a scaling technique. The algorithm has been in use now for more than 1/2 a year and works well. For each iteration it has to calculate the gradient to the error twice. For comparison standard conjugate gradient methods using line search use in average 4-20 calculations of the gradient. The algorithm is an order of magnitude faster than backpropagation and seems to be able to handle ravine phenomena much more effective than BP. A preprint paper describing this algorithm in detail will soon be available (in 1 or 2 weeks) by ftp. Here follows a short abstract of the paper: Abstract-- A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. SCG uses second order information from the neural network, but requires only O(N) memory usage. The performance of SCG is benchmarked against the performance of the standard backpropagation algorithm (BP) and several recently proposed standard conjugate gradient algorithms. SCG yields a speed-up at least an order of magnitude relative to BP. The speed-up depends on the convergence criterion,i.e., the bigger demand for reduction in error the bigger the speed-up. SCG is fully automated including no user dependent parameters and avoids a time consuming line search, which other conjugate gradient algorithms use in order to determine a good step size. Incorporating problem dependent structural information in the architecture of a neural network often lowers the overall complexity. The smaller the complexity of the neural network relative to the problem domain, the bigger the possibility that the weights space contains long ravines characterized by sharp curvature. While BP is inefficient on these ravine phenomena, SCG handles them efectively. Martin NB. Any question or comments on this short writing or on the preprint following shortly would be appriciated. ______________________________ Martin Fodslette Moller Computer Science Dept. University of Aarhus Denmark e-mail: fodslett@daimi.aau.dk _______________________________