Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!sharkey!itivax!dhw
From: dhw@itivax.iti.org (David H. West)
Newsgroups: comp.ai.neural-nets
Subject: Re: Back Propagation question... (follow up)
Message-ID: <1361@itivax.iti.org>
Date: 30 May 89 20:09:55 GMT
References: <226@cs.columbia.edu>
Reply-To: dhw@itivax.UUCP (David H. West)
Organization: Industrial Technology Institute
Lines: 24

In article <226@cs.columbia.edu> camargo@cs.columbia.edu (Francisco Camargo) writes:
]Hi there,
]| "dumping" (you mean "weighting") each change by 1/k, where k is the number
]| of the example (?) sounds really wierd, do you mean if you had four examples
]| in your training set changes from the fourth would be worth only a quarter
]| as much as changes from the second? surely you don't mean this!

]| tony plate

]My problem is that I can find any (theoretical) justification for the "online"
]method other that "Robins Monroe algorithm" (I may have misspelled his name, 
]for which I apologize, but I don't have my references near by). But then, the
]"dumping" factor is required for guaranteed convergence. I tried the "online"
]method and it does seem to perform better. But, WHY does it work ? How come it
]converges so well (despite of making {a_k}=1) ?

]/Kiko.
]camargo@cs.columbia.edu

It's related to an old statistical hack for calculating the change
in the mean of a set of observations when another is added.  That 
formula takes 2 or 3 lines of algebra to derive, on a bad day.

-David       dhw@itivax.iti.org