Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!agate!eos!riacs!danforth From: danforth@riacs.edu (Douglas G. Danforth) Newsgroups: comp.ai.neural-nets Subject: Re: : Step Function. Biases are necessary Keywords: learning,generalization Message-ID: <1693@hydra.riacs.edu> Date: 13 Sep 89 16:39:54 GMT Reply-To: danforth@hydra.riacs.edu.UUCP (Douglas G. Danforth) Organization: Research Institute for Advanced Computer Science Lines: 49 Tony Russo writes: >I believe a couple of points have been brought out in our discussion over the >past few weeks. In my *opinion*, >1) learning and memorization are two very different things. >2) learing implies generalization and rule-extraction. Memorization does not. >3) Biases of some sort are required to learn anything. >4) Learning is fastest with borderline patterns that require the machine >to differentiate subtle differences in classes. But, it also seems reasonable >that strikingly different examples also play an important role in learning. >5) Learnablility should be defined in terms of a particular set of biases, >perhaps dependent on network architecture. (e.g. some things are just not >learnable by a particular network or machine) In regard to points (1) and (2). In a standard random access memory where all possible addresses can be represented (24 bit address=> 16MB) there is no generalization. Each slot is filled independently of every other slot. However, when dealing with large numbers of bits, say 1,000, it is not possible to represent all possible addresses and yet a "memory" can be constructed for this case. The memory is sparse in the address space. Only a sampling of the possible memory addresses can be present. These "hard locations" can act as repositories for information written into the memory by distributing the information among a set of hard locations which are "near" the desired (but not physically present) address. One can read from an arbitrary address by "pooling" the information stored in the "nearby" hard locations and then thresholding the result. The reason for this preamble is to show that reading from (presenting an input pattern to) a sparse distributed memory (a neural net) can indeed produce output which is a "generalization". The generalization can take the form of: (A) what's the most similar thing to this pattern that I have seen before, or (B) what is the Platonic ideal of this fuzzy pattern? When dealing with very large dimensional spaces it becomes difficult to dismiss the generalization characteristics of a sparse distributed memory for they begin having animal-like capabilities. Most neural net research todate has focused on very small numbers of nodes: input, hidden, and output. For these small cases, I agree, the utility of memory may not seem great. By "rule extraction" I assume you mean in analogy to human concious throught where one can articulate the "rule" that one has discovered. This is an ongoing area of debate. Is it necessary to "interpret" the connection weights or just evaluate the performance of the system? IMHO, that depends upon your goals. Doug Danforth danforth@riacs.edu