Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!ucsd!sdcc6!beowulf!pluto
From: pluto@beowulf.ucsd.edu (Mark Plutowski)
Newsgroups: comp.ai.neural-nets
Subject: Re: Back-Propagation Weight Initialization
Message-ID: <pluto.660964490@beowulf>
Date: 12 Dec 90 01:14:50 GMT
References: <1325@helens.Stanford.EDU> <1990Dec11.091222.3501@neural.dynas.se>
Sender: news@sdcc6.ucsd.edu
Lines: 32
Nntp-Posting-Host: beowulf.ucsd.edu

egel@neural.dynas.se (Peter Egelberg) writes:

>The improvement in learning speed sounds fine. But what about generalization.
>Does weight initialization improve generalization?
   			. . .
>I'm not saying that learning speed is unimportant. I'm saying that
>generalization is a greater problem when using neural networks in real world
>applications.

>-- 
>Peter Egelberg			E-mail:	egel@neural.dynas.se
>Neural AB			Phone:	+46 46 11 00 90
>Otto Lindbladsv. 5
>223 65 LUND, SWEDEN

True, many people are more concerned with the quality of the 
fit (that is, the accuracy of generalization, in your terms)
than with learning time.  

IMHO, it is probably the case that the initial choice of weights 
has a significant effect upon the quality of the fit achieved 
for a particular learning run, unless we use a learning rule 
which is insensitive to the initial parameterization of the 
network function.  Gradient descent is not such a learning rule,
in and of itself;  if complemented with a global search mechanism
for searching the parameter space, (say, via genetic algorithms or 
even a grid search) it can be.  

-=-=
M.E. Plutowski,  pluto%cs@ucsd.edu 
UCSD,  Computer Science and Engineering 0114
La Jolla, California 92093-0114