Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!sgi!arisia!kanga!chrisley From: chrisley@kanga.uucp (Ron Chrisley UNTIL 10/3/88) Newsgroups: comp.ai.neural-nets Subject: Re: : Step Function. Biases are necessary Keywords: learning,generalization Message-ID: <2934@arisia.Xerox.COM> Date: 12 Sep 89 06:59:09 GMT References: <1060@rex.cs.tulane.edu> <6980@sdcsvax.UCSD.Edu> <2795@arisia.Xerox.COM> <1829@cbnewsl.ATT.COM> Sender: news@arisia.Xerox.COM Reply-To: k.karn@macbeth.stanford.edu (Ron Chrisley) Organization: Xerox Palo Alto Research Center Lines: 78 I wrote: > [...] I do not see how the fact that > generalization = bias implies the optimality of learning the boundary > conditions, and would be very interested in having you elaborate on why you > think it might. > Then, Tony Russo said: "My reply to this is to give a simplified, one-dimensional case... A boundary is most efficiently (read: learning will be faster) defined by its location in n-dimesional space. Since neural nets don't learn this way, the next most efficient definition of a boundary is obtained by giving examples of two items very (infintessimally) close to the boundary but on different sides of it. In this way, in 1-D space for example, two points can define a boundary. Those two points or examples are the most important ones to present to the net. If, for instance, we wanted to teach the concept of negative and positive (zero is the boundary), -1 and +1 (in integer space) would be a sufficient set of examples (given, of course, some definition of bias). Conversely, examples like -102312341 and +823456 are not very helpful." I claim that although there might be algorithms that learn generalization biases for which the boundary cases provide quickest learning, there are also algorithms for which this is not the case. For instance, some algorithms may learn biases better if you provide exemplars. I know this is exactly what you are claiming to not be the case, but I don't yet see an argument. What is the difference between -1:1 and -100000:100000? If there is a difference in the quality of bias learning, I am sure that it is dependent on some assumptions concerning the bias learning algorithm you have in mind, or concerning the nature of the data. The "boundary is best" does not seem to be true for arbitrary learning algorithms, especially for particular generalization tasks. Consider a 1D task, where everything within distance D of the origin is in cat 1, and all points outside of this region are in cat 2. Now consider the following way of learning bias: Start with the bias that after seeing n samples, you will categorize everything within radius r of any of the samples as the class of those samples, r being small. Then, r is increased in a least squares way, until generalization error is minimized. Clearly, it would be best to use samples near the origin to train this task/bias learning algorithm combination. If samples near the boundaries are used, then there will only be small error in estimated generalization, resulting in small changes to r, which would converge to the following classification: cat 1 if the sample is within epsilon (the small value of r) of +D or -D. But if samples from the interiors of the classes are used, estimated generalization error will better match actual error, which will be initially high, resulting in an increase of r. Thus we will wind up with the following classification: cat 1 if the sample is within D of 0. Don't get me wrong, I do think that learning near the boundaries, ala LVQ2, is a good idea. But I don't think it is a good idea for all tasks, I am not convinced that it is a good way to learn 2nd-order *biases* (as opposed to 1st order distributions), and even if it is good for that, I question whether it has anything to do with the fact that generalization = bias, as opposed to the Bayesian arguments Prof. Kohonen gives. If it were true for Bayesian reasons, you would also probably be assuming that the bias learning is performed after you already have a relatively good solution to the problem. The reason why it was not a good idea in the example I gave was because that bias learning alg needs information about the entire distribution. Only looking at boundaries throws that away. But of course, I may be off track here. You certainly seem to hold the gen=bias => boundary cases implication in high regard. Please explain if I have misunderstood. Ron By the way, has anybody looked at 2nd order bias learning as I have sketched it out here? Thanks to Tony for pointing me in the right direction...