Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!ucbvax!agate!eos!riacs!danforth From: danforth@riacs.edu (Douglas G. Danforth) Newsgroups: comp.ai.neural-nets Subject: KNN (was Re: Step function) Message-ID: <1675@hydra.riacs.edu> Date: 6 Sep 89 16:39:40 GMT Reply-To: danforth@hydra.riacs.edu.UUCP (Douglas G. Danforth) Organization: Research Institute for Advanced Computer Science Lines: 56 Newsgroups: comp.ai.neural-nets Subject: KNN (was Re: : Step Function) Keywords: bias in learning,generalization Ron Chrisley writes: " So either you abandon memorization (since there are usually an infinite # of inputs anyway, and you don't know in advance what an effective discretization of the input space would be) and therefore use a bias, or, in the cases where memorization is possible, you will have to look at more than one instance of each input (I'm wondering: Isn't this how KNN or k-means classifiers, or even codebooks, work?). But what will be the proper number of instances to look at? That cannot be determined in adavance, so that may be yet another way bias can creep in... Ron Chrisley " ============================================================================= By using a large (256 bit) but finite discrete representation of, say speech, memorization is still possible. The continuous-to-discrete mapping does introduce a bias (not too bad if the bit pattern size is large). Large bit spaces can be handled by large numbers of hidden nodes which act as sponges to soak up the information in the input patterns. The output pattern (class label) is then the pooled decision of locally activated hidden units. A "rule of thumb" one can use is that the number of activated units should be approximately the square root of the number of hidden units. The above comments apply to sparse distributed memories. Since Ron raised the issue of K nearest neighbor, I would like to put forth a challenge to the Net: POSTULATE: K Nearest Neighbor will beat any Artificial Neural Network as a PRACTICAL alternative. I stress the word "practical". There are published results (Xerox PARC) that have ANN's beating out KNN's for some experiments. The issue as I see it: is it worth the effort to perfect ANN learning rules when KNN is fast, simple, and very good on the average? Disadvantages of KNN: little biological plausibility, little "insight" gained by using KNN. Note, however, that by using KNN with "train only on error" the size of memory can be greatly reduced and the boundaries are where the majority of the samples are placed. With K small but larger than 1, the effects of "label errors" can be reduced (averaged out). For these reasons, I find it difficult to dismiss KNN. Comments? ----------------------------------- Doug Danforth danforth@riacs.edu -----------------------------------