Path: utzoo!attcan!uunet!lll-winken!csd4.milw.wisc.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!labrea!Portia!kortge
From: kortge@Portia.Stanford.EDU (Chris Kortge)
Newsgroups: comp.ai.neural-nets
Subject: Re: Data Compression
Summary: training time questions
Keywords: Principal Components Analysis
Message-ID: <984@Portia.Stanford.EDU>
Date: 18 Mar 89 20:04:30 GMT
References: <10199@nsc.nsc.com>
Sender: Chris Kortge <kortge@Portia.stanford.edu>
Reply-To: kortge@psych.stanford.edu (Chris Kortge)
Organization: Stanford University
Lines: 37

In article <10199@nsc.nsc.com> andrew@nsc.nsc.com (andrew) writes:
> [...]
>3a) Terence D. Sanger, "Optimal Unsupervised Learning in a Single-Layer
>   Linear Feedforward Neural Network", MIT AI Lab., NE43-743, Cambridge, 
>   MA 02139. TDS@wheaties.ai.mit.edu
>
>The Sanger 3a) paper is highly germane; he seems to have defined a method
>whereby maximal information preservation occurs across one layer, using
>only linear elements, and a purely local update structure. The learned
>matrix of weights becomes (row-wise) the eigenvectors of the input
>autocorrelation. [...]
>Highly relevant is his comparitive data with respect to (cf. #2) self-supervised
>backprop, where numerous criteria show GHA ("Generalised Hebbian Algorithm")
>to be superior. These criteria include:
>- training time
>  [and several other things]

It wasn't clear to me from reading Sanger's thesis that the GHA is
obviously faster than self-supervised backprop.  He says that, for
backprop, "training time still seems to be an exponential function of
the number of units in the network."  It seems like this would be
problem-dependent, though, and principal components is not that tough
as typical backprop problems go.  Does anyone know of an actual scaling
study on this?  (E.g., where n-dimensional random Gaussian vectors with
m known principal components are used as inputs, and n & m are varied,
keeping percent variance explainable constant, say.)

Another problem with the claim is that the fuzzy term "training time"
hides something important.  Namely, Sanger's algorithm trains output
units one-by-one during the presentation of each pattern, and to my
knowledge this sequentiality is inherent.  Thus it could be that the GHA
is superior to backprop when measured in "pattern time", but not when
measured in real time (i.e. operations of an ideal parallel device).
Here again, I don't know the answer; I would be interested in whatever
info people have on this.

Chris Kortge