Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!jhunix!ins_atge
From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards)
Newsgroups: comp.ai.neural-nets
Subject: Re: Back-Propagation
Summary: Conjugate-Gradient
Message-ID: <5972@jhunix.HCF.JHU.EDU>
Date: 31 Jul 90 19:23:46 GMT
References: <7010@helios.TAMU.EDU>
Reply-To: ins_atge@jhunix.UUCP (Thomas G Edwards)
Distribution: usa
Organization: The Johns Hopkins University - HCF
Lines: 52

In article <7010@helios.TAMU.EDU> guansy@cs.tamu.edu (Sheng-Yih Guan) writes:
>
>In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes:
>>We are encountering problems while training the different paradigms , especially
>>Back - Propagation paradigm. The training is very time consuming and tedious.
>>Can anyone help to choose the training parameters' values that can
>>reduce the training sessions. We are working in pattern classification of
>>moderate size.e.g 100 input attributes.

>In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture,
>they have tried to analyze the resons why backprop learning is so slow and
>they have identified two major problems:
>	1. the step-size problem, and 
>	2. the moving target problem.

Fahlman and Lebiere's Cascade-Correlation learning is a definate
improvement over conventional backprop methods.  By building up the
network layer by layer, they reduce the backprop calculation to
dealing with a single weight layer at  time, which incredibly speeds
up the process, as well as eliminating the moving target problem.
I find this algorithm very pleasing, as it explains how a multi-layered
neural system can be built up quickly.  Their TR has a wonderful example
of how C.C. learned the two spiral problem.  The first layer splits the
input space in half, the second forms a few big receptive fields,
and each layer after that forms receptive fields which come closer and closer
to exactly partitioning the input space into the two separate spirals.

The single weight layer learning is done with Quickprop (which 
could be used in a multi-layer network by itself).  This method 
uses second order information about the gradient to determine the 
next step.  (Cascade-Correlation performs much worse using 
perceptron learning as opposed to Quickprop, from my experience).

However, there is another ftpable answer.  Conjugate-gradient methods
are well known for their ability to determine function minima in
numerical analysis.  Check out the chapter in _Numerical_Recipes_
on function minimization for an explanation and comparison with other
methods, such as steepest-descent.  A conjugate gradient program called OPT is
available by anonymous ftp from cse.ogc.edu in the /pub/nnvowels
directory.  

I have used this program to develop a threat determination network
using infrared temporal intenisty data (128 or 256 inputs, 8-32
hidden units, 2 outputs...takes about 1 minute to learn 20
exemplars, but I am running on a Convex).

I would like to see a comparison (over many runs, as we all know,
backpropagation is sensitive to initial conditions) of
OPT vs. Cascade Correlation with Quickprop Learning.
Infact, I might just try this myself.

-Thomas Edwards