Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!think!ames!sdcsvax!ucbvax!dartmouth.EDU!sandon From: sandon@dartmouth.EDU (Peter Sandon) Newsgroups: comp.ai.digest Subject: Re: Neural Networks & Unaligned fields Message-ID: <8709160638.AA11145@ucbvax.Berkeley.EDU> Date: Fri, 11-Sep-87 22:20:38 EDT Article-I.D.: ucbvax.8709160638.AA11145 Posted: Fri Sep 11 22:20:38 1987 Date-Received: Fri, 18-Sep-87 05:37:24 EDT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 47 Approved: ailist@stripe.sri.com I did not read the Byte article either. However, assuming that the network under discussion had no way to represent the similarity relationship among different nodes that represent translated versions of the same feature, it is not surprising that it would have a difficult time generalizing from a given pattern to an 'unaligned' version of that pattern. Rumelhart pointed out to Banks that what is needed are many sets of units having similar weight patterns, that is, weights that are sensitive to translated versions of a given pattern. In addition, the relationship between these similar units must be represented. Rumelhart suggests adding units as needed but does not mention how to relate these additional units to the trained unit. Fukushima did something similar in his Neocognitron, by broadcasting a learned weight set to an entire layer of units which were then all connected to an OR unit. This OR unit then represented the fact that all the units represented the same feature, modulo translation. Of course, broadcasting weights requires more global control than many would like, and the OR is not quite the relation we want for patterns of any complexity. In 1981, Hinton suggested a means of separately representing shape and translation in a network, such that 'unaligned' patterns could be recognized. In my thesis, I implemented a modified version of that network scheme, in order to demonstrate that a network can generalize object recognition across translation. The network that I implemented is five layers deep, which proved too much for standard backpropagation (the generalized delta rule) and for my extensions to the GDR. However, generalization across translation can be demonstrated in a subnetwork of this network. I am working on further improvements to backpropagation that will allow the entire network to be trained. It is important to recognize that there are many useless generalizations that might be made, and a few useful ones. The Hamming distance between two 'T's that are offset from one another is much greater than that between a 'T' and a 'C' that is offset such that it overlaps much of the 'T'. What is the 'correct' generalization to be made when trying to classify these patterns? In order to get the desired generalization, the network must be biased toward developing representations in which the Hamming distances (of the intermediate representations) between within-class patterns is small compared to that between other patterns. Generalization based on similarity will then be appropriate. Without such biases, 'good' generalization would be quite surprising. --Pete Sandon