Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!jhunix!ins_atge
From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards)
Newsgroups: comp.ai.neural-nets
Subject: Re: State of the Art Feed-Forward Network Training Algorithms
Summary: CG etc
Message-ID: <8412@jhunix.HCF.JHU.EDU>
Date: 18 May 91 03:24:18 GMT
References: <AJ3U.91May17010658@opal.cs.virginia.edu> <1991May17.090435.9180@fwi.uva.nl> <GREENBA.91May17100557@gambia.crd.ge.com>
Organization: The Johns Hopkins University - HCF
Lines: 36

In article <GREENBA.91May17100557@gambia.crd.ge.com> greenba@gambia.crd.ge.com (ben a green) writes:
>In article <1991May17.090435.9180@fwi.uva.nl> smagt@fwi.uva.nl (Patrick van der Smagt) writes
>   aj3u@opal.cs.virginia.edu (Asim Jalis) writes:
>   >What is the state of the art in training feed-forward networks.

>   I myself haven't used error back-propagation for over a year, but CG
>   instead.  It sizzles.

>My implementation of CG trained to 90% on this problem in 1676 presentations
>of the training set. That's a factor of 89 faster than backprop.

I think it is important to point out that backpropogation refers to
a method of developing the error gradient w.r.t the weights.
One might use simple gradient descent, steepest descent w. linesearch,
CG (which really rocks when properly done), or modified Newton Methods
(which can go even faster than CG, but not by a heck of alot).

Someone at Oregon Graduate Institute used to have a good CG program
avaliable via anonymous ftp.  I have used that implementation, and
it was exceedingly fast.  Yes, you'd find local minima in small problems,
but in most normal size problems there were none.

I would reccommend that people interested in training NNs look into
Cascade-Correlation (Fahlman, TR available on cheops.cis.ohio-state.edu
in /pub/neuroprose I believe).  It builds up a network with a minimal
number of hidden units (relatively minimal, I don't think it is 
optimally minimal), and all learning is done on a single layer of
weights at a time, so no nasty backprop pass.  It is exceedingly
fast, especially if you use something better than simple gradient
descent on the correlation and error minimization.

Cascade-Correlation has recently been extended to recurrent nets,
and I plan to see how it works on a sun activity predictor over the
next 3 months.

-Thomas Edwards