Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!sun-barr!newstop!exodus!hanami.Eng.Sun.COM!landman From: landman@hanami.Eng.Sun.COM (Howard A. Landman) Newsgroups: comp.ai.neural-nets Subject: Re: Are Conjugate Gradient algorithms any good? Message-ID: <10184@exodus.Eng.Sun.COM> Date: 21 Mar 91 02:10:09 GMT References: <1991Mar4.142559.21857@daimi.aau.dk> <^9B&5R#@warwick.ac.uk> <91Mar7.145659edt.437@neuron.ai.toronto.edu> <9682@exodus.Eng.Sun.COM> Sender: news@exodus.Eng.Sun.COM Organization: Sun Microsystems, Mt. View, Ca. Lines: 49 In article <9682@exodus.Eng.Sun.COM> I wrote: >>One disadvantage of CG methods is that they often require the whole >>training set to be memory-resident. For gigantic training data this >>can be a real problem. In article greenba@gambia.crd.ge.com (ben a green) writes: >I don't understand why this is peculiar to CG methods. Any method that requires >repeated updating of weights will want to retain the training set in memory >just in order to avoid being IO-bound. Assuming that the entire training set can fit into your virtual memory, that's true, although page faults can cause that to become "IO-bound" as well. But I had one case where the training data was over 500 MB. Since my VM size was less than 500 MB, the program which required data to be memory-resident simply DIDN'T WORK, but a program that read the data each pass would merely have been slow. A more subtle aspect: in some cases (e.g. mine), the "original" training data is far more dense than the training data which has been massaged into the input format (or memory layout) of the program. In extreme cases (e.g. mine :-) the difference can be greater than two orders of magnitude (2 MB vs 500 MB). For a "one sample at a time" program, if you have source, it is possible to embed the code to do this expansion in the program itself, so that the entire expanded training set never exists anywhere, and all the swap & I/O problems vanish (at the cost in CPU of reexpanding the data each time). For an "all data in memory at once" program, you don't have that choice; the whole expanded data set must exist in memory even if you can avoid having it on disk. Even if embedded expansion is possible, it may not always make sense. This depends on the relative expense of expanding versus the performance cost of doing the training with a fully expanded data set. In my case, expanding the data was about 1/8th the cost of doing a single CG training cycle, so the overhead would have been quite acceptable as long as each training cycle ran more than 12% faster when the program virtual image was 3 MB in size than when it was 500 MB in size. (There is also an implicit assumption here that each datum only needs to be expanded once per training cycle. This is clearly true for the "one sample at a time" approaches, but may not be for CG.) Even if that speedup was not forthcoming, at least the program would have been able to handle the full set of training data. You're right that this doesn't *necessarily* have anything to do with CG per se, except that the CG program I chose to use ("opt") had the above features. I think that line search may more-or-less require all training data to be present. If anyone knows of a CG program which *doesn't* need all data in-memory, please describe it. -- Howard A. Landman landman@eng.sun.com -or- sun!landman