Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!sharkey!itivax!dhw From: dhw@itivax.iti.org (David H. West) Newsgroups: comp.ai.neural-nets Subject: Re: Back Propagation question... (follow up) Message-ID: <1361@itivax.iti.org> Date: 30 May 89 20:09:55 GMT References: <226@cs.columbia.edu> Reply-To: dhw@itivax.UUCP (David H. West) Organization: Industrial Technology Institute Lines: 24 In article <226@cs.columbia.edu> camargo@cs.columbia.edu (Francisco Camargo) writes: ]Hi there, ]| "dumping" (you mean "weighting") each change by 1/k, where k is the number ]| of the example (?) sounds really wierd, do you mean if you had four examples ]| in your training set changes from the fourth would be worth only a quarter ]| as much as changes from the second? surely you don't mean this! ]| tony plate ]My problem is that I can find any (theoretical) justification for the "online" ]method other that "Robins Monroe algorithm" (I may have misspelled his name, ]for which I apologize, but I don't have my references near by). But then, the ]"dumping" factor is required for guaranteed convergence. I tried the "online" ]method and it does seem to perform better. But, WHY does it work ? How come it ]converges so well (despite of making {a_k}=1) ? ]/Kiko. ]camargo@cs.columbia.edu It's related to an old statistical hack for calculating the change in the mean of a set of observations when another is added. That formula takes 2 or 3 lines of algebra to derive, on a bad day. -David dhw@itivax.iti.org