Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!boulder!bill
From: bill@boulder.Colorado.EDU
Newsgroups: comp.ai.neural-nets
Subject: Re: : Step Function
Keywords: bias in learning,generalization
Message-ID: <11384@boulder.Colorado.EDU>
Date: 6 Sep 89 02:24:07 GMT
References: <1060@rex.cs.tulane.edu> <6980@sdcsvax.UCSD.Edu> <17538@bellcore.bellcore.com> <1727@cbnewsl.ATT.COM> <7011@sdcsvax.UCSD.Edu> <11308@boulder.Colorado.EDU> <7024@sdcsvax.UCSD.Edu>
Sender: news@boulder.Colorado.EDU
Reply-To: bill@synapse.Colorado.EDU ()
Organization: University of Colorado, Boulder
Lines: 44


>  You might be interested in some approaches to learning theory
>  in which the device has access to a teacher that can provide more
>  than just a error-signal  ... e.g., once a hypothesis is formed by
>  the device, the teacher, rather than just saying whether or not
>  the hypothesis is correct for given values of inputs, can provide
>  counterexamples to the device.  Of course, this requires a teacher
>  with expert knowledge!  And, the ability to ascertain what 
>  hypothesis the device currently entertains.  
>
>  Applying this idea to neural networks is difficult.  
>  The question is: how to apportion this type of training
>  signal to the appropriate units in the network.  And, once
>  received by the pertinent units, what to make of it.  
>  Any ideas on how to backpropate such training information?
>  "Commentary" feedback can apply to the entire hypothesis formed 
>  by the device, not just its performance on a particular input.   

  In principle, if you can come up with the training information,
you can use back-prop (or something very much like it) to apply it.

  Back-prop is essentially gradient descent for an error function.
Usually the error function is either the total squared error in
the output layer for a randomly chosen input (this is "online" back-
prop), or the average total squared error for a set of inputs (which
is "batch-mode" back-prop).  But as far as the mathematics is concerned,
the error function can be anything you please.

  For a neural network, the "hypothesis" it forms is encoded by its
weights, and any sort of "comment" you please can be jammed into the
error function, if only you can formulate it as a numerical measure
calculable from the weights.

  In fact, a few people have already been trying to do that sort of
thing.  As you probably know, a persistent problem with back-prop
is that it tends to give networks that don't generalize very well to
new inputs.  One hope for getting better performance is to augment
the error function with an extra term representing the "complexity"
of the network, since it seems intuitively reasonable that simpler
networks should generalize better.  It isn't obvious how best to
measure complexity:  maybe the sum of the magnitudes of all the weights;
or maybe the number of weights exceeding some fixed threshold; or maybe
something else.  This is work in progress, but there have been some
promising beginnings.  (Perhaps somebody directly involved can comment.)