Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!columbia!cs!camargo From: camargo@cs.columbia.edu (Francisco Camargo) Newsgroups: comp.ai.neural-nets Subject: Back Propagation question... (follow up) Message-ID: <226@cs.columbia.edu> Date: 30 May 89 14:18:30 GMT Organization: Columbia University Department of Computer Science Lines: 85 Hi there, I'm re-posting my previous message together with a reply that I received from Tony Plate and my reply to him. I'd really appreciate comments on this issue. Thanks to all. ----------------------------------------------------------------------------- |In article <224@cs.columbia.edu> you write: || ||Can anyone put some light in the following issue: || ||How should one compute the weight adjustments in BackProp ? ||From reading PDP, one gathers the impression that the DELTAS ||should be acumulated over all INPUT PATTERNS and only then ||a STEP is taken towards the gradient. Robins Monroe suggests ||a stochastic algorithm with proved convergency if one takes one ||step at each pattern presentation, but dumps its effect by a factor ||1/k where "k" is the presentation number. Other people,(from codes ||that I've seen flying around) seems to take a STEP a each presentation ||a don't take into account any dumping factors. I've tried myself both ||approaches and they all seem to work. After all, which is the correct way ||of adjusting the weights ? Acumulate the errors over all patterns ? Or, work ||towards the minimum as new patterns are presented.Which are the implications? || ||Any light is this issue is extremelly appreciated. || ----------------------------------------------------------------------------- | There are two standard methods of doing the updates, sometimes called | "batch" and "online" learning. | | In "batch" learning, all the changes are accumulated for one pass through | all the examples. At the end of the pass (or "epoch") the update is made. | Thus, each link requires an extra storage field in which to accumulate | the changes. | | In "online" learning, the change is made after seeing each example. | | Some people claim online is better, others claim batch is better. | | "dumping" (you mean "weighting") each change by 1/k, where k is the number | of the example (?) sounds really wierd, do you mean if you had four examples | in your training set changes from the fourth would be worth only a quarter | as much as changes from the second? surely you don't mean this! | | Some people use a momentum term, and some change the learning rate during | learning. Using momentum seems to be generally a good thing, and it's | easy to do. Automatically changing the learning rate is much harder. | | ..... | ..... Connectionist Learning Algorithms by Hinton.... | ..... | | tony plate ------------------------------------------------------------------------------ Hi Tony, Sorry for my previous message being so unspecific. What I meat is that the dumping occurs after each "epoch." The idea is that the changes in the weights tend to be of lesser and lesser importance. Actually, the way the algorithm is stated, one should dump (I really mean dump) the step size by a series of terms {a_k} where "sum({a_k}^2)