Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!dkuug!daimi!baronen From: baronen@daimi.aau.dk (Carsten Greve) Newsgroups: comp.ai.neural-nets Subject: Are Conjugate Gradient algorithms any good? Summary: Pattern versus Epoch update Keywords: NETtalk, Conjugate Gradient algorithms, Back-propagation Message-ID: <1991Mar4.142559.21857@daimi.aau.dk> Date: 4 Mar 91 14:25:59 GMT Sender: baronen@daimi.aau.dk (Carsten Greve) Organization: DAIMI: Computer Science Department, Aarhus University, Denmark Lines: 67 Recently there has been much talk about the so-called Conjugate Gradient algorithms and their use in feed-forward neural networks. We have applied one of these algorithms, the Scaled Conjugate Gradient algorithm [1], on the NETtalk problem [2] with poor results. In their original experiment Sejnowski and Rosenberg used the conventional back-propagation algorithm with weight updates after each presentation of one word (word update). Our experiments with the same algorithm confirmed the results of Sejnowski and Rosenberg. However, experiments showed that back-propagation was unable to converge if the weights were updated only after the entire training set had been presented (epoch update). The SCG algorithm is reported, like several other Conjugate Gradient algorithms, to outperform ordinary back-propagation with epoch learning. Yet we found that the SCG algorithm was unable to match the performance of back-propagation when word updates were used instead. In the SCG algorithm weights are (normally) updated once after each presentation of the entire training set (epoch update). A modified version of the SCG algorithm permits more frequent weight updates. The net is trained on a small number of patterns for some time after which the weights are updated. Then some new training patterns are chosen and the net is trained on those patterns, et cetera. This version gave improved results. However, the SCG was still unable to match standard back-propagation with word updates. We have used a fully connected 3-layer net of 7 * 26 input units, 60 hidden units, and 57 output units (one for each phoneme). The training set consists of 1000 words, and the net was trained for 30000 word presentations. Average error on each pattern Number of after 30000 word presentations: weight updates: Back-prop with word update: 0.125 30000 Back-prop with epoch update: 2.888 (failed to converge) 30 SCG with block update: *) 0.303 300 SCG with epoch update: 0.871 30 *) A block size of 10 words was used, with each block being trained 10 times before weights were updated. Even after 200000 word presentations the SCG algorithm with epoch update had only reduced the average error to 0.215. We have also tested the SCG algorithm on the protein data that Ning Qian and Sejnowski used [3], and again we found that SCG was unable to compete with back-propagation with pattern update. We would like to know whether any of you have successfully applied a Conjugate Gradient algorithm on a large scale problem, and, if so, whether it out- performed ordinary back-propagation with pattern update. Also, we would like to know, if there exist any Conjugate Gradient algorithms which allow the use of pattern update. References: 1. Moller, Martin F. (1990) "A Scaled Conjugate Algorithm for Fast Supervised Learning" Preprint. Available by ftp from cheops.cis.ohio-state.edu in directory pub/neuroprose as moller.conjugate-gradient.ps.Z 2. Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. 3. Ning Qian and Terrence J. Sejnowski (1988), "Predicting the Secondary Structure of Globular Proteins Using Neural Network Models" in Journal of Molecular Biology 202, 865-884. Academic Press.