Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!rutgers!dptg!att!cbnewsl!apr
From: apr@cbnewsl.ATT.COM (anthony.p.russo)
Newsgroups: comp.ai.neural-nets
Subject: Re: : Step Function. Biases are necessary
Keywords: learning,generalization
Message-ID: <1851@cbnewsl.ATT.COM>
Date: 13 Sep 89 11:55:11 GMT
References: <1060@rex.cs.tulane.edu> <6980@sdcsvax.UCSD.Edu> <2934@arisia.Xerox.COM>
Organization: AT&T Bell Laboratories
Lines: 88

Ron Chrisley wrote:

> I claim that although there might be algorithms that learn generalization
> biases for which the boundary cases provide quickest learning, there are
> also algorithms for which this is not the case.  For instance, some algorithms
> may learn biases better if you provide exemplars.
> 
> I know this is exactly what you are claiming to not be the case, but I don't
> yet see an argument.  What is the difference between -1:1 and -100000:100000?
> If there is a difference in the quality of bias learning, I am sure that it
> is dependent on some assumptions concerning the bias learning algorithm you
> have in mind, or concerning the nature of the data.
> 
> The "boundary is best" does not seem to be true for arbitrary learning
> algorithms, especially for particular generalization tasks.  Consider a 1D
> task, where everything within distance D of the origin is in cat 1, and
> all points outside of this region are in cat 2.  Now consider the following
> way of learning bias:  Start with the bias that after seeing n samples, you
> will categorize everything within radius r of any of the samples as the class
> of those samples, r being small.  

***
I think since the task is with respect to the origin, the bias should be also.
Then only the distance D would need to be learned, and all the information
about D would be included in the boundary of radius D.
For instance, when I talk of learning boundaries, my bias must be that
everything in between those boundaries is of the same class.
***

> Then, r is increased in a least squares way,
> until generalization error is minimized.  Clearly, it would be best to use
> samples near the origin to train this task/bias learning algorithm combination

> [ sound argument deleted ]

> Don't get me wrong, I do think that learning near the boundaries, ala LVQ2,
> is a good idea.  But I don't think it is a good idea for all tasks, I am
> not convinced that it is a good way to learn 2nd-order *biases* (as opposed
> to 1st order distributions), and even if it is good for that, I question
> whether it has anything to do with the fact that generalization = bias, as
> opposed to the Bayesian arguments Prof. Kohonen gives.  If it were true for
> Bayesian reasons, you would also probably be assuming that the bias learning
> is performed after you already have a relatively good solution to the problem.
> 
> The reason why it was not a good idea in the example I gave was because that
> bias learning alg needs information about the entire distribution.  Only
> looking at boundaries throws that away.

***
Bayesian classifiers are really boundary sets. 
The boundaries a *calculated* from a priori knowledge of the distributions, but
once a boundary is calculated, the information about the entire distribution
*is* thrown away.
By teaching the machine those boundaries we have done the same thing.
***

> 
> But of course, I may be off track here.  You certainly seem to hold the
> gen=bias => boundary cases implication in high regard.  Please explain if I
> have misunderstood.
> 
***
I believe a couple of points have been brought out in our discussion over the
past few weeks. In my *opinion*,
1) learning and memorization are two very different things.
2) learing implies generalization and rule-extraction. Memorization does not.
3) Biases of some sort are required to learn anything.
4) Learning is fastest with borderline patterns that require the machine
to differentiate subtle differences in classes. But, it also seems reasonable
that strikingly different examples also play an important role in learning.
5) Learnablility should be defined in terms of a particular set of biases,
perhaps dependent on network architecture. (e.g. some things are just not
learnable by a particular network or machine)

Not bad.
***

> By the way,has anybody looked at 2nd order bias learning as I have sketched it
> out here?  Thanks to Tony for pointing me in the right direction...

***
You're welcome. It's a lot of fun. I just have this vision of a bunch of
researchers quietly reading these messages and jotting down notes for
future work and papers. More people should join the discussion; none
of the five points above are proven.
***

 ~ tony ~