Path: utzoo!attcan!uunet!lll-winken!csd4.milw.wisc.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!labrea!Portia!kortge From: kortge@Portia.Stanford.EDU (Chris Kortge) Newsgroups: comp.ai.neural-nets Subject: Re: Data Compression Summary: training time questions Keywords: Principal Components Analysis Message-ID: <984@Portia.Stanford.EDU> Date: 18 Mar 89 20:04:30 GMT References: <10199@nsc.nsc.com> Sender: Chris Kortge Reply-To: kortge@psych.stanford.edu (Chris Kortge) Organization: Stanford University Lines: 37 In article <10199@nsc.nsc.com> andrew@nsc.nsc.com (andrew) writes: > [...] >3a) Terence D. Sanger, "Optimal Unsupervised Learning in a Single-Layer > Linear Feedforward Neural Network", MIT AI Lab., NE43-743, Cambridge, > MA 02139. TDS@wheaties.ai.mit.edu > >The Sanger 3a) paper is highly germane; he seems to have defined a method >whereby maximal information preservation occurs across one layer, using >only linear elements, and a purely local update structure. The learned >matrix of weights becomes (row-wise) the eigenvectors of the input >autocorrelation. [...] >Highly relevant is his comparitive data with respect to (cf. #2) self-supervised >backprop, where numerous criteria show GHA ("Generalised Hebbian Algorithm") >to be superior. These criteria include: >- training time > [and several other things] It wasn't clear to me from reading Sanger's thesis that the GHA is obviously faster than self-supervised backprop. He says that, for backprop, "training time still seems to be an exponential function of the number of units in the network." It seems like this would be problem-dependent, though, and principal components is not that tough as typical backprop problems go. Does anyone know of an actual scaling study on this? (E.g., where n-dimensional random Gaussian vectors with m known principal components are used as inputs, and n & m are varied, keeping percent variance explainable constant, say.) Another problem with the claim is that the fuzzy term "training time" hides something important. Namely, Sanger's algorithm trains output units one-by-one during the presentation of each pattern, and to my knowledge this sequentiality is inherent. Thus it could be that the GHA is superior to backprop when measured in "pattern time", but not when measured in real time (i.e. operations of an ideal parallel device). Here again, I don't know the answer; I would be interested in whatever info people have on this. Chris Kortge