Path: utzoo!censor!isgtec!wido From: wido@isgtec.uucp (Wido Menhardt) Newsgroups: comp.theory Subject: Re: Least-squares clustering. Message-ID: <989@isgtec.UUCP> Date: 14 Apr 91 20:41:19 GMT References: <11014@uwm.edu> Sender: news@isgtec.UUCP Reply-To: wido@isgtec.UUCP (Wido Menhardt) Organization: ISG Technologies Inc. Mississauga Ont. Canada Lines: 27 In article <11014@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes: > > I have a particular algorithm in mind. It takes a set, S, of N-vectors > and partitions it into sets S1, S2 such that > > sigma(S1) + sigma(S2) is minimized > where > sigma(S) = total square deviation of S > (that is, sum x in S: |x - average(S)|^2) > > It runs sequentually through S and calculates S1 and S2 in one pass. the algorithm which solves this problem suboptimally and iteratively is known as K-MEANS, very popular, and it works as follows: 1) assign each vector of S randomly (or systematically) to either S1 or S2 2) compute the mean vectors fro S1 and S2 3) re-assign each vector of S to the class (S1,S2) whose mean vector is closer (this takes care of the square deviation criterion) 4) go to 2). stop the iteration, if nothing changes anymore. the result depends on (1). I would be interested to hear about a one-pass algorithm: after all, you do kot know average(S1) in advance .... -wido.