Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!agate!eos!riacs!danforth
From: danforth@riacs.edu (Douglas G. Danforth)
Newsgroups: comp.ai.neural-nets
Subject: Re: : Step Function. Biases are necessary
Keywords: learning,generalization
Message-ID: <1693@hydra.riacs.edu>
Date: 13 Sep 89 16:39:54 GMT
Reply-To: danforth@hydra.riacs.edu.UUCP (Douglas G. Danforth)
Organization: Research Institute for Advanced Computer Science
Lines: 49

Tony Russo writes:

>I believe a couple of points have been brought out in our discussion over the
>past few weeks. In my *opinion*,
>1) learning and memorization are two very different things.
>2) learing implies generalization and rule-extraction. Memorization does not.
>3) Biases of some sort are required to learn anything.
>4) Learning is fastest with borderline patterns that require the machine
>to differentiate subtle differences in classes. But, it also seems reasonable
>that strikingly different examples also play an important role in learning.
>5) Learnablility should be defined in terms of a particular set of biases,
>perhaps dependent on network architecture. (e.g. some things are just not
>learnable by a particular network or machine)


In regard to points (1) and (2).

     In a standard random access memory where all possible addresses
can be represented (24 bit address=> 16MB) there is no generalization.  Each
slot is filled independently of every other slot.  However, when dealing
with large numbers of bits, say 1,000, it is not possible to represent all possible
addresses and yet a "memory" can be constructed for this case.  The memory
is sparse in the address space.  Only a sampling of the possible memory addresses
can be present.  These "hard locations" can act as repositories for information
written into the memory by distributing the information among a set of hard
locations which are "near" the desired  (but not physically present) address.
One can read from an arbitrary address by "pooling" the information stored
in the "nearby" hard locations and then thresholding the result.

     The reason for this  preamble is to show that reading from (presenting
an input pattern to) a sparse distributed memory (a neural net) can indeed
produce output which is a "generalization".  The generalization can take
the form of: (A) what's the most similar thing to this pattern that I have
seen before, or (B) what is the Platonic ideal of this fuzzy pattern?

     When dealing with very large dimensional spaces it becomes difficult to
dismiss the generalization characteristics of a sparse distributed memory
for they begin having animal-like capabilities.  Most neural net research
todate has focused on very small numbers of nodes: input, hidden, and output.
For these small cases, I agree, the utility of memory may not seem great.

     By "rule extraction" I assume you mean in analogy to human concious throught
where one can articulate the "rule" that one has discovered.  This is an ongoing
area of debate.  Is it necessary to "interpret" the connection weights or just
evaluate the performance of the system?   IMHO, that depends upon your goals.


Doug Danforth
danforth@riacs.edu