Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!rpi!crdgw1!greenba From: greenba@gambia.crd.ge.com (ben a green) Newsgroups: comp.ai.neural-nets Subject: Re: Are Conjugate Gradient algorithms any good? Message-ID: Date: 13 Mar 91 13:16:09 GMT References: <1991Mar4.142559.21857@daimi.aau.dk> <^9B&5R#@warwick.ac.uk> <91Mar7.145659edt.437@neuron.ai.toronto.edu> <9682@exodus.Eng.Sun.COM> Sender: news@crdgw1.crd.ge.com Organization: GE Corporate Research & Development Lines: 45 In-reply-to: landman@hanami.Eng.Sun.COM's message of 12 Mar 91 21:48:20 GMT In article <9682@exodus.Eng.Sun.COM> landman@hanami.Eng.Sun.COM (Howard A. Landman) writes: ... Do you think it would be fair to say that training data is typically not very large because people simply don't have machines powerful enough (or algorithms efficient enough) to deal with anything larger? I could easily be running a few hundred thousand patterns of a few hundred inputs each into a few thousand neurons, *IF* I had anything that could handle it. In the commercial and military applications we have investigated, training data are limited by the nature of the problem. Suppose you want to diagnose faults in aircraft engines from data collected before every takeoff. You have lots of normal patterns but typically very few fault patterns of any one kind of fault. One disadvantage of CG methods is that they often require the whole training set to be memory-resident. For gigantic training data this can be a real problem. I don't understand why this is peculiar to CG methods. Any method that requires repeated updating of weights will want to retain the training set in memory just in order to avoid being IO-bound. Does anyone have any insights on methods for handling large amounts of training data efficiently? If the hundred thousand patterns are required in order to define the decision boundary with sufficient precision, then there is no alternative. If they are not, you can try sampling. Sorry if these seem like obvious suggestions, but we have thought a lot about the problem and have come up with nothing better. BTW, we use a variety of Partial CG with such success that we have stopped trying anything else. Maybe it depends on the kind of problem you have. We are mystified by the complaints against CG. A recent MIT thesis blasted it, but we ran the same data through our version with excellent results. BG, 5 kyu -- Ben A. Green, Jr. greenba@crd.ge.com Speaking only for myself, of course.