Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!boulder!bill From: bill@boulder.Colorado.EDU Newsgroups: comp.ai.neural-nets Subject: Re: : Step Function Keywords: bias in learning,generalization Message-ID: <11384@boulder.Colorado.EDU> Date: 6 Sep 89 02:24:07 GMT References: <1060@rex.cs.tulane.edu> <6980@sdcsvax.UCSD.Edu> <17538@bellcore.bellcore.com> <1727@cbnewsl.ATT.COM> <7011@sdcsvax.UCSD.Edu> <11308@boulder.Colorado.EDU> <7024@sdcsvax.UCSD.Edu> Sender: news@boulder.Colorado.EDU Reply-To: bill@synapse.Colorado.EDU () Organization: University of Colorado, Boulder Lines: 44 > You might be interested in some approaches to learning theory > in which the device has access to a teacher that can provide more > than just a error-signal ... e.g., once a hypothesis is formed by > the device, the teacher, rather than just saying whether or not > the hypothesis is correct for given values of inputs, can provide > counterexamples to the device. Of course, this requires a teacher > with expert knowledge! And, the ability to ascertain what > hypothesis the device currently entertains. > > Applying this idea to neural networks is difficult. > The question is: how to apportion this type of training > signal to the appropriate units in the network. And, once > received by the pertinent units, what to make of it. > Any ideas on how to backpropate such training information? > "Commentary" feedback can apply to the entire hypothesis formed > by the device, not just its performance on a particular input. In principle, if you can come up with the training information, you can use back-prop (or something very much like it) to apply it. Back-prop is essentially gradient descent for an error function. Usually the error function is either the total squared error in the output layer for a randomly chosen input (this is "online" back- prop), or the average total squared error for a set of inputs (which is "batch-mode" back-prop). But as far as the mathematics is concerned, the error function can be anything you please. For a neural network, the "hypothesis" it forms is encoded by its weights, and any sort of "comment" you please can be jammed into the error function, if only you can formulate it as a numerical measure calculable from the weights. In fact, a few people have already been trying to do that sort of thing. As you probably know, a persistent problem with back-prop is that it tends to give networks that don't generalize very well to new inputs. One hope for getting better performance is to augment the error function with an extra term representing the "complexity" of the network, since it seems intuitively reasonable that simpler networks should generalize better. It isn't obvious how best to measure complexity: maybe the sum of the magnitudes of all the weights; or maybe the number of weights exceeding some fixed threshold; or maybe something else. This is work in progress, but there have been some promising beginnings. (Perhaps somebody directly involved can comment.)