Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!dkuug!daimi!baronen
From: baronen@daimi.aau.dk (Carsten Greve)
Newsgroups: comp.ai.neural-nets
Subject: Are Conjugate Gradient algorithms any good?
Summary: Pattern versus Epoch update
Keywords: NETtalk, Conjugate Gradient algorithms, Back-propagation
Message-ID: <1991Mar4.142559.21857@daimi.aau.dk>
Date: 4 Mar 91 14:25:59 GMT
Sender: baronen@daimi.aau.dk (Carsten Greve)
Organization: DAIMI: Computer Science Department, Aarhus University, Denmark
Lines: 67

Recently there has been much talk about the so-called Conjugate Gradient
algorithms and their use in feed-forward neural networks. We have applied
one of these algorithms, the Scaled Conjugate Gradient algorithm [1], on
the NETtalk problem [2] with poor results.

In their original experiment Sejnowski and Rosenberg used the conventional
back-propagation algorithm with weight updates after each presentation of
one word (word update). Our experiments with the same algorithm confirmed the
results of Sejnowski and Rosenberg. However, experiments showed that
back-propagation was unable to converge if the weights were updated only
after the entire training set had been presented (epoch update).

The SCG algorithm is reported, like several other Conjugate Gradient
algorithms, to outperform ordinary back-propagation with epoch learning.
Yet we found that the SCG algorithm was unable to match the performance of
back-propagation when word updates were used instead. In the SCG algorithm
weights are (normally) updated once after each presentation of the entire
training set (epoch update).

A modified version of the SCG algorithm permits more frequent weight updates.
The net is trained on a small number of patterns for some time after which
the weights are updated. Then some new training patterns are chosen and
the net is trained on those patterns, et cetera. This version gave improved
results. However, the SCG was still unable to match standard back-propagation
with word updates.

We have used a fully connected 3-layer net of 7 * 26 input units, 60 hidden
units, and 57 output units (one for each phoneme). The training set consists of
1000 words, and the net was trained for 30000 word presentations.

		   Average error on each pattern	Number of
		   after 30000 word presentations:	weight updates:

Back-prop with word update:   0.125                           30000
Back-prop with epoch update:  2.888 (failed to converge)         30
SCG with block update: *)     0.303                             300
SCG with epoch update:        0.871                              30

	*) A block size of 10 words was used, with each block being
	trained 10 times before weights were updated.

Even after 200000 word presentations the SCG algorithm with epoch update
had only reduced the average error to 0.215.

We have also tested the SCG algorithm on the protein data that Ning Qian
and Sejnowski used [3], and again we found that SCG was unable to compete
with back-propagation with pattern update.

We would like to know whether any of you have successfully applied a Conjugate
Gradient algorithm on a large scale problem, and, if so, whether it out-
performed ordinary back-propagation with pattern update. Also, we would like
to know, if there exist any Conjugate Gradient algorithms which allow the use
of pattern update.


References:

1. Moller, Martin F. (1990) "A Scaled Conjugate Algorithm for Fast Supervised
Learning" Preprint. Available by ftp from cheops.cis.ohio-state.edu in
directory pub/neuroprose as moller.conjugate-gradient.ps.Z

2. Sejnowski, T.J., and Rosenberg, C.R. (1987).  "Parallel networks that
learn to pronounce English text" in Complex Systems, 1, 145-168.

3. Ning Qian and Terrence J. Sejnowski (1988), "Predicting the Secondary
Structure of Globular Proteins Using Neural Network Models" in Journal of
Molecular Biology 202, 865-884.  Academic Press.