Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!rpi!crdgw1!greenba
From: greenba@gambia.crd.ge.com (ben a green)
Newsgroups: comp.ai.neural-nets
Subject: Re: Are Conjugate Gradient algorithms any good?
Message-ID: <GREENBA.91Mar13081609@gambia.crd.ge.com>
Date: 13 Mar 91 13:16:09 GMT
References: <1991Mar4.142559.21857@daimi.aau.dk> <^9B&5R#@warwick.ac.uk>
	<pluto.668285404@cornelius>
	<91Mar7.145659edt.437@neuron.ai.toronto.edu> <9682@exodus.Eng.Sun.COM>
Sender: news@crdgw1.crd.ge.com
Organization: GE Corporate Research & Development
Lines: 45
In-reply-to: landman@hanami.Eng.Sun.COM's message of 12 Mar 91 21:48:20 GMT

In article <9682@exodus.Eng.Sun.COM> landman@hanami.Eng.Sun.COM (Howard A. Landman) writes:

	...

   Do you think it would be fair to say that training data is typically
   not very large because people simply don't have machines powerful
   enough (or algorithms efficient enough) to deal with anything larger?
   I could easily be running a few hundred thousand patterns of a few
   hundred inputs each into a few thousand neurons, *IF* I had anything
   that could handle it.

In the commercial and military applications we have investigated, training data
are limited by the nature of the problem. Suppose you want to diagnose faults
in aircraft engines from data collected before every takeoff. You have lots
of normal patterns but typically very few fault patterns of any one kind
of fault.

   One disadvantage of CG methods is that they often require the whole
   training set to be memory-resident.  For gigantic training data this
   can be a real problem.

I don't understand why this is peculiar to CG methods. Any method that requires
repeated updating of weights will want to retain the training set in memory just
in order to avoid being IO-bound.

   Does anyone have any insights on methods for handling large amounts
   of training data efficiently?

If the hundred thousand patterns are required in order to define the decision
boundary with sufficient precision, then there is no alternative. If they are
not, you can try sampling. 

Sorry if these seem like obvious suggestions, but we have thought a lot about the
problem and have come up with nothing better.

BTW, we use a variety of Partial CG with such success that we have stopped trying
anything else. Maybe it depends on the kind of problem you have. We are mystified
by the complaints against CG. A recent MIT thesis blasted it, but we ran the
same data through our version with excellent results.

BG, 5 kyu
--
Ben A. Green, Jr.              
greenba@crd.ge.com
  Speaking only for myself, of course.