Xref: utzoo sci.math:17099 comp.ai.neural-nets:3300
Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!dali.cs.montana.edu!milton!serval!yoda.eecs.wsu.edu!rcrane
From: rcrane@yoda.eecs.wsu.edu (Robert Crane - EE582)
Newsgroups: sci.math,comp.ai.neural-nets
Subject: Pseudo-inverse formula problem.  HELP!
Message-ID: <1991Apr27.213346.9351@serval.net.wsu.edu>
Date: 27 Apr 91 21:33:46 GMT
Sender: news@serval.net.wsu.edu (USENET News System)
Organization: Washington State University
Lines: 52
Originator: rcrane@yoda.eecs.wsu.edu

In the neural network literature, it is stated that one way to
calculate a weight vector, w, that minimizes the error for simple
linear units (such as perceptrons) is to use the pseudo-inverse
formula:

     w = Q*a'*b

     a is a nxp input vector space
     b is a 1xp training output vector space
     Q = inverse(A'*A)

     and the pseudo inverse is, Q*a'.

It is stated that the method for computing the weights applies only if
Q EXISTS, and that this condition requires the input patterns to be
linearly independent.

My problem is that in practical examples, I have been using
this formula with great success even though I have over
a p=100 input vectors (divided into two classes which are linerly 
separable) spanning only two dimensions.  Clearly, these vectors
canNOT be linerly independent, yet I can compute this equation.
The solution appears to be minimizing the mean square error in
every case in the problems I have been working on.


To be a little more explicit, the mean square error is defined
to be:

   E = .5*(a*w-b)'*(a*w-b)
     = .5*[w'*a'*a*w - 2*w'*a'*b + b'*b] 

Taking the partial derivative of E with respect to w 
and setting it equally to 0 to find the minimum we get:

   0 = A'*a*w - A'*b
   A'*a*w = A'*b

 and thus

   w = inv(a'*a)*a'*b.

Again, it is stated that the Q=inverse(a'*a) exists only if
the vectors that make up A are linearly independent.
But as I have stated before, I have been computing this
when the vectors that make up A are not linearly independent.

What is going on here????
Can someone please help me in explaining the mathematics involved here?
Does not the pseudo-inverse exist for any nxp matrix???
-- 
-bob crane (rcrane@yoda.eecs.wsu.edu)