Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!think!ames!sdcsvax!ucbvax!dartmouth.EDU!sandon
From: sandon@dartmouth.EDU (Peter Sandon)
Newsgroups: comp.ai.digest
Subject: Re: Neural Networks & Unaligned fields
Message-ID: <8709160638.AA11145@ucbvax.Berkeley.EDU>
Date: Fri, 11-Sep-87 22:20:38 EDT
Article-I.D.: ucbvax.8709160638.AA11145
Posted: Fri Sep 11 22:20:38 1987
Date-Received: Fri, 18-Sep-87 05:37:24 EDT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The ARPA Internet
Lines: 47
Approved: ailist@stripe.sri.com


I did not read the Byte article either. However, assuming that
the network under discussion had no way to represent the similarity
relationship among different nodes that represent translated
versions of the same feature, it is not surprising that it would
have a difficult time generalizing from a given pattern to
an 'unaligned' version of that pattern.

Rumelhart pointed out to Banks that what is needed are many sets
of units having similar weight patterns, that is, weights that
are sensitive to translated versions of a given pattern. In addition,
the relationship between these similar units must be represented.
Rumelhart suggests adding units as needed but does not mention how
to relate these additional units to the trained unit. Fukushima did
something similar in his Neocognitron, by broadcasting a learned 
weight set to an entire layer of units which were then all connected 
to an OR unit. This OR unit then represented the fact that all the
units represented the same feature, modulo translation. Of course,
broadcasting weights requires more global control than many would
like, and the OR is not quite the relation we want for patterns of
any complexity.

In 1981, Hinton suggested a means of separately representing shape and
translation in a network, such that 'unaligned' patterns could be
recognized. In my thesis, I implemented a modified version of that
network scheme, in order to demonstrate that a network can generalize
object recognition across translation. The network that I implemented
is five layers deep, which proved too much for standard backpropagation
(the generalized delta rule) and for my extensions to the GDR.
However, generalization across translation can be demonstrated in
a subnetwork of this network. I am working on further improvements
to backpropagation that will allow the entire network to be trained.

It is important to recognize that there are many useless 
generalizations that might be made, and a few useful ones. The
Hamming distance between two 'T's that are offset from one another
is much greater than that between a 'T' and a 'C' that is offset such
that it overlaps much of the 'T'. What is the 'correct' generalization
to be made when trying to classify these patterns? In order to get
the desired generalization, the network must be biased toward
developing representations in which the Hamming distances (of the
intermediate representations) between within-class patterns is
small compared to that between other patterns. Generalization based
on similarity will then be appropriate. Without such biases, 'good'
generalization would be quite surprising.

--Pete Sandon