Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!mbkennel
From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Newsgroups: comp.ai.neural-nets
Subject: Re: Size limits of BP (Was Re: ART and non-stationary environments)
Keywords: BP
Message-ID: <8128@phoenix.Princeton.EDU>
Date: 4 May 89 00:38:20 GMT
References: <18589@gatech.edu> <15313@eecae.UUCP> <18615@gatech.edu>
Reply-To: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Organization: Princeton University, NJ
Lines: 51

In article <18615@gatech.edu) myke@gatech.UUCP (Mike Rynolds) writes:
)In article <15313@eecae.UUCP) frey@eecae.UUCP (Zachary Frey) writes:
))I am not familiar with ART, but I am familiar with back-propagation from
))the Rummelhart & McClelland PDP volumes, and I don't remember ever
))seeing anything about a size limit to networks implemented with back-
))propagation.  Could you elaborate?
))
)Try increasing the number of internal nodes without changing the input/output
)you train it on. If you were to simulate more complex input/output, an increased
)number of internal nodes would be necessary to learn the greater complexity.
)But even without greater complexity you will notice a rapid decrease in
)learning rate as a function of the number of internal nodes, and at a certain
)point, it stops learning all together.

Yes, it starts to memorize once you have significantly more free parameters
than examples.  I would think, though, that this is a fundamental limitation---
you can always fit a 5th degree polynomial through 5 points, for example.
The same sort of thing should apply in networks...(plug for my adviser's
research).  If you need to use significantly more free weights than examples
then you're wasting weights, or the functional representation is poor.
If you don't have enough examples, you won't be able to learn a complicated
function if the given examples don't map out the input space well enough.

)-- 
)Myke Reynolds
)School of Information & Computer Science, Georgia Tech, Atlanta GA 30332
)uucp:	...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!myke
)Internet:	myke@gatech.edu

Something that I'd like to explore further is learning with a radial-basis
function network, i.e. a one-hidden layer network where the input layer's
input function is:

 	n_j =   sum_i  (o_i - w_ij)^2;  o_j = 1/(1+n_j^2) e.g.

instead of the conventional  n_j = sum_i o_i * w_ij.

You can learn the output layer of this network using a guaranteed
conventional algorithm (linear least-squares; singular-value-decomposition)
once you've selected the centers, i.e. the first layer of weights, with
k-means clustering for example.

With one hidden layer, this network can perform complicated nonlinear
transformations, unlike the simple perceptron.

For predicting chaotic time series, where the inherent locality of the
functional representation is an advantage, this method is more accurate
and faster to converge, I've found.

Matt Kennel
mbkennel@phoenix.princeton.edu