Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!ucbvax!agate!eos!riacs!danforth
From: danforth@riacs.edu (Douglas G. Danforth)
Newsgroups: comp.ai.neural-nets
Subject: KNN (was Re: Step function)
Message-ID: <1675@hydra.riacs.edu>
Date: 6 Sep 89 16:39:40 GMT
Reply-To: danforth@hydra.riacs.edu.UUCP (Douglas G. Danforth)
Organization: Research Institute for Advanced Computer Science
Lines: 56

Newsgroups: comp.ai.neural-nets
Subject: KNN (was Re: : Step Function)
Keywords: bias in learning,generalization

Ron Chrisley writes:

"
So either you abandon memorization (since there are usually an infinite # of
inputs anyway, and you don't know in advance what an effective discretization
of the input space would be) and therefore use a bias, or, in the cases where
memorization is possible, you will have to look at more than one instance of
each input (I'm wondering:  Isn't this how KNN or k-means classifiers, or even
codebooks, work?).

But what will be the proper number of instances to look at?  That cannot be
determined in adavance, so that may be yet another way bias can creep in...

Ron Chrisley
"
=============================================================================
    By using a large (256 bit) but finite discrete representation of, say
speech, memorization is still possible.  The continuous-to-discrete mapping
does introduce a bias (not too bad if the bit pattern size is large).  Large
bit spaces can be handled by large numbers of hidden nodes which act as sponges
to soak up the information in the input patterns.  The output pattern (class
label) is then the pooled decision of locally activated hidden units.  A "rule
of thumb" one can use is that the number of activated units should be
approximately the square root of the number of hidden units.  The above
comments apply to sparse distributed memories.

    Since Ron raised the issue of K nearest neighbor, I would like to put forth
a challenge to the Net:


POSTULATE: K Nearest Neighbor will beat any Artificial Neural Network as
	   a PRACTICAL alternative.


    I stress the word "practical".  There are published results (Xerox PARC) that
have ANN's beating out KNN's for some experiments.  The issue as I see it:  is
it worth the effort to perfect ANN learning rules when KNN is fast, simple, and
very good on the average?

    Disadvantages of KNN:  little biological plausibility, little "insight"
gained by using KNN.

    Note, however, that by using KNN with "train only on error" the size of memory
can be greatly reduced and the boundaries are where the majority of the samples
are placed.  With K small but larger than 1, the effects of "label errors" can be
reduced (averaged out).  For these reasons, I find it difficult to dismiss KNN.

Comments?
-----------------------------------
Doug Danforth
danforth@riacs.edu
-----------------------------------