Path: utzoo!censor!isgtec!wido
From: wido@isgtec.uucp (Wido Menhardt)
Newsgroups: comp.theory
Subject: Re: Least-squares clustering.
Message-ID: <989@isgtec.UUCP>
Date: 14 Apr 91 20:41:19 GMT
References: <11014@uwm.edu>
Sender: news@isgtec.UUCP
Reply-To: wido@isgtec.UUCP (Wido Menhardt)
Organization: ISG Technologies Inc. Mississauga Ont. Canada
Lines: 27

In article <11014@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes:
> 
>    I have a particular algorithm in mind.  It takes a set, S, of N-vectors
> and partitions it into sets S1, S2 such that
> 
> 		    sigma(S1) + sigma(S2) is minimized
>                     where
>                        sigma(S) = total square deviation of S
> 			  (that is, sum x in S: |x - average(S)|^2)
> 
> It runs sequentually through S and calculates S1 and S2 in one pass.

the algorithm which solves this problem suboptimally and iteratively 
is known as K-MEANS, very popular, and it works as follows:

1) assign each vector of S randomly (or systematically) to either S1 or S2
2) compute the mean vectors fro S1 and S2
3) re-assign each vector of S to the class (S1,S2) whose mean vector is 
   closer (this takes care of the square deviation criterion)
4) go to 2). 

stop the iteration, if nothing changes anymore. the result depends on (1).

I would be interested to hear about a one-pass algorithm: after all, you 
do kot know average(S1) in advance ....

-wido.