Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!j.cc.purdue.edu!pur-ee!pc.ecn.purdue.edu!cb.ecn.purdue.edu!kavuri From: kavuri@cb.ecn.purdue.edu (Surya N Kavuri ) Newsgroups: comp.ai.neural-nets Subject: Re: Training Message-ID: <769@cb.ecn.purdue.edu> Date: 22 Mar 89 04:07:05 GMT References: <2698@sun.soe.clarkson.edu> <2351@buengc.BU.EDU> <1577@vicom.COM> Organization: Purdue University Engineering Computer Network Lines: 37 In article <1577@vicom.COM>, hal@vicom.COM (Hal Hardenbergh) writes: > > A colleague and I have tried several of the back-prop "speedup" methods. All > that we have tried do speed up convergence, to some degree, as determined by > the number of epochs (training iterations). However, none of them reliably > provide a speed improvement as measured by wall clock time. > > The ones which do (sometimes) provide a slight improvement in wall clock time > do not do so reliably. It's sort of like varying the convergence and momentum > factors. Depending on the random initialization of the weights and biases, > c1 and m1 will work better than c2 and m2, and vice versa. > > As long as we are simulating artificial neural nets in software (if simulating > is the right word here), does anyone know of a back-prop speedup trick which > reduces the wall-clock training time? > > Hal Hardenbergh [incl std dsclmr] hal@vicom.com This, I believe, could be tried using other inexact search methods with BP. REF: Conjugate-gradient methods with inexact searches , in Mathematics of Operations Research vol.3 Gradient evaluation is an expensive computation in BP. A gradient-reuse alogo was suggested whose idea is the following: Gradients are reused several times until the resulting weight updates no longer lead to a reduction in error. Furthe, usage of Batching makes the serach direction more accurate. Batching means : Considering the total error (to be minimized) as the sum of squared differnces between the desired output and the observed, summed over all patterns. SURYA (FIAT LUX)