Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!spool.mu.edu!munnari.oz.au!mel.dit.csiro.au!latcs1!sietsma From: sietsma@latcs1.oz.au (Jocelyn Sietsma Penington) Newsgroups: comp.ai.neural-nets Subject: Re: generalization in NN's Keywords: ldf generalization Message-ID: <9881@latcs1.oz.au> Date: 3 Apr 91 05:04:33 GMT Article-I.D.: latcs1.9881 References: <1991Apr2.205240.24668@milton.u.washington.edu> Reply-To: sietsma@latcs1.oz.au (Jocelyn Sietsma Penington) Organization: Comp Sci, La Trobe Uni, Australia Lines: 54 In article <1991Apr2.205240.24668@milton.u.washington.edu> nealiphc@milton.u.washington.edu (Phillip Neal) writes: >I have a problem with the ability of a neural net to generalize. ... >I break the data into a 400 observation training set and >a 200 observation test set. ... [NN does better on training set than linear discr. fn., but poorer on test set] >And no matter how long I let the NN run, and no matter what >number of hidden layer nodes, I always get about the same >results. > >I know I am violating the rule of thumb to have 10 times more >training data than nodes in the net. But hey, data is expensive. For starters, I think the rule of thumb quoted above is nonsense - it doesn't take any notice of the characteristics of your data. I think it was calculated for training random inputs to random outputs, and who wants to do that? The problem here may well be that you are actually training too long. See the paper by Chauvin in NIPS 2, or by Weigend, Huberman and Rumelhart (Predicting the future: a connect'st approach - Stanford-PDP-90-01, to appear in Int'l J. of Neural Systems) for graphs showing that as training continues, performance on the training set continuously improves, but performance on the test set reaches a maximum and then declines. Unfortunately the only cures I know are expensive, either in data or time. 1. You can split your data set in three: training, cross-validation and testing. Train, periodically checking the error rate on the cross-validation set. When this starts to rise, stop training. Use the test set to find the true generalization performance. 2. You can reduce the effective size of your network. The 2 papers I referenced above are about adding an extra cost term to the standard back-prop of errors to encourage the network to eliminate unnecessary units or connections. This appears to prevent the overtraining problem. Unfortunately it greatly increases time required for training, and getting the parameter values right might be difficult. (I haven't tried these, so I don't know.) 2b. You MIGHT get some improvement by taking your trained network as it now and removing any redundant units by one of the available pruning methods. On a toy problem, I have found that this improves generalization. (Sietsma & Dow Neural Networks 1991) See Mozer & Smolensky, NIPS 1, and Le Cun, Denker & Solla, NIPS 2, for alternate methods of pruning trained networks. hope this helps, Jocelyn -- (Jocelyn Penington, a.k.a. Sietsma - feel free to use either) Email: sietsma@LATCS1.oz.au Address: Materials Research Laboratory Phone: (03) 319 3775 or (03) 479 1057 PO Box 50, Melbourne 3032 This article does not commit me, LaTrobe Uni or M.R.L. to any act or opinion.