Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!jhunix!ins_atge From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) Newsgroups: comp.ai.neural-nets Subject: Re: Back-Propagation Summary: Conjugate-Gradient Message-ID: <5972@jhunix.HCF.JHU.EDU> Date: 31 Jul 90 19:23:46 GMT References: <7010@helios.TAMU.EDU> Reply-To: ins_atge@jhunix.UUCP (Thomas G Edwards) Distribution: usa Organization: The Johns Hopkins University - HCF Lines: 52 In article <7010@helios.TAMU.EDU> guansy@cs.tamu.edu (Sheng-Yih Guan) writes: > >In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes: >>We are encountering problems while training the different paradigms , especially >>Back - Propagation paradigm. The training is very time consuming and tedious. >>Can anyone help to choose the training parameters' values that can >>reduce the training sessions. We are working in pattern classification of >>moderate size.e.g 100 input attributes. >In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture, >they have tried to analyze the resons why backprop learning is so slow and >they have identified two major problems: > 1. the step-size problem, and > 2. the moving target problem. Fahlman and Lebiere's Cascade-Correlation learning is a definate improvement over conventional backprop methods. By building up the network layer by layer, they reduce the backprop calculation to dealing with a single weight layer at time, which incredibly speeds up the process, as well as eliminating the moving target problem. I find this algorithm very pleasing, as it explains how a multi-layered neural system can be built up quickly. Their TR has a wonderful example of how C.C. learned the two spiral problem. The first layer splits the input space in half, the second forms a few big receptive fields, and each layer after that forms receptive fields which come closer and closer to exactly partitioning the input space into the two separate spirals. The single weight layer learning is done with Quickprop (which could be used in a multi-layer network by itself). This method uses second order information about the gradient to determine the next step. (Cascade-Correlation performs much worse using perceptron learning as opposed to Quickprop, from my experience). However, there is another ftpable answer. Conjugate-gradient methods are well known for their ability to determine function minima in numerical analysis. Check out the chapter in _Numerical_Recipes_ on function minimization for an explanation and comparison with other methods, such as steepest-descent. A conjugate gradient program called OPT is available by anonymous ftp from cse.ogc.edu in the /pub/nnvowels directory. I have used this program to develop a threat determination network using infrared temporal intenisty data (128 or 256 inputs, 8-32 hidden units, 2 outputs...takes about 1 minute to learn 20 exemplars, but I am running on a Convex). I would like to see a comparison (over many runs, as we all know, backpropagation is sensitive to initial conditions) of OPT vs. Cascade Correlation with Quickprop Learning. Infact, I might just try this myself. -Thomas Edwards