Xref: utzoo sci.math.stat:2010 comp.ai:8623 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!pacbell.com!ucsd!sdcc6!beowulf!demers From: demers@beowulf.ucsd.edu (David Demers) Newsgroups: sci.math.stat,comp.ai Subject: Clustering code sought Message-ID: <17020@sdcc6.ucsd.edu> Date: 26 Feb 91 23:21:51 GMT Sender: news@sdcc6.ucsd.edu Followup-To: sci.math.stat Organization: CSE Dept., UC San Diego Lines: 58 Nntp-Posting-Host: beowulf.ucsd.edu Forgive me if this is a bit long... I am looking for two pieces of software: (1) a k-means program to group a large number of points (all of which are inside of the unit 3-sphere) (I don't really need to compute k-means, since the points are relatively uniformly distributed, I just need a fast way to partition the unit 3-sphere into k "boxes" of roughly equal volume and near-minimum surface area), and (2) a flexible clustering program. I have a SPARCstation, SGI Personal Iris & IBM PC(!) available as platforms, would prefer standard Unix stuff, C or FORTRAN. LONGER DESCRIPTION OF DATA AND PROBLEM: My data consists of pairs of 3 vectors, which are the independent and dependent variables of a smooth and continuous a.e. four-to-one map from a 3-d manifold onto the unit sphere (the interior, not just the surface...). Thus X->Y for X & Y both 3-vectors. The distribution of Y is nearly uniform in the sphere, and I want to cut it up into smaller bits, say by k-means, then cluster the associated vectors X into 4 clusters so that I can partition the entire set of data (25,000 or so sample points) into 4 separate sets, so that the inverse map for each of the 4 branches can be approximated. The data X for a region in Y should look like ellipsoids; or at least I know the convex hulls of the 4 clusters will not intersect. I do have a C program to do hierarchical clustering, originally by Yoshiro Miyata, I believe, and last modified by Andreas Stolcke, which I may be able to tweak. I'd like to be able to use different clustering criteria on the data. I do suspect single-linkage will be adequate, which is why the hierarchical program may be satisfactory if modifiable, but flexibility in choosing criteria would be nice. I have looked at Fukunaga and Duda & Hart. They are somewhat helpful in regards to theory (though there seems to be little theory in clustering...), but not for algorithms. I have begun to code this up, and can probably do it all myself in a week or so, but it is always nice to avoid reinventing the wheel, to reinvent a phrase. It's more fun to spend time analyzing the data and thinking about it than debugging C... If anyone can point me to good source code, references or even algorithms, I'd greatly appreciate it. -- Dave DeMers demers@cs.ucsd.edu Computer Science & Engineering C-014 demers%cs@ucsd.bitnet UC San Diego ...!ucsd!cs!demers La Jolla, CA 92093-0114 (619) 534-8187,-0688 ddemers@UCSD Brought to you by Super Global Mega Corp .com