Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!tank!eecae!cps3xx!flynn From: flynn@pixel.cps.msu.edu (Patrick J. Flynn) Newsgroups: comp.graphics Subject: Re: How to map 24-bit RGB to best EGA/VGA palette? Keywords: RGB EGA VGA color Message-ID: <4503@cps3xx.UUCP> Date: 7 Sep 89 02:47:47 GMT References: <4379@cps3xx.UUCP> <126@vsserv.scri.fsu.edu> <3129@cbnewsm.ATT.COM> <7743@cbmvax.UUCP> <13319@well.UUCP> <586@celit.com> <4862@eos.UUCP> <13381@well.UUCP> <9464@venera.isi.edu> <9473@venera.isi.edu> <71733@yale-celray.yale.UUCP> Sender: usenet@cps3xx.UUCP Organization: Pixel Abusers Lines: 102 In article <71733@yale-celray.yale.UUCP> craig@weedeater.math.yale.edu (Craig Kolb) writes: >In article <9473@venera.isi.edu> raveling@isi.edu (Paul Raveling) writes: >> [In the >>> article, Pat Flynn wrote:] >>> If so, one *could* attempt to reduce the palette >>> through the application of a traditional clustering algorithm (there >>> was an article in ACM TOMS on this last year; ... >> >> [...] >> It might be possible that a Monte Carlo color selection algorithm >> would work well for some images. I may try that as an experiment >> too. >> >>> ... and I >>> seriously doubt that the `clusters' in color space (if any) are >>> (say) multivariate Gaussian; I also doubt that people are willing to wait >>> hours (!) for a clustering algorithm to reduce their palette. >> >> Caramba! Maybe I'll only skim that article. > >Unless I'm mistaken, the article in question is: > >Wan, Wong, and Prusinkiewicz, An Algorithm for Multidimensional > Data Clustering, Transactions on Mathematical Software, > Vol 14, #2 (June '88), pp. 153-162 That's the one I was referring to. Next time I follow up, I'll do it from my office (where the journals are), not from home :-) . > [...] >As far as speed is concerned, quantizing a 512x512x24-bit version of "lenna" >to 256 colors on my Iris 4D60 takes approximately 5 seconds, which is >typical for most images of similar size. Good to see some actual numbers. I should emphasize that taking a clustering approach to color quantization does not *guarantee* that your results will take a long time to come. Taking a *naive* approach to clustering often may. \begin{pedantic} Most clustering algorithms base membership decisions on a proximity measure defined on the `feature space' of interest. Here, our feature space is 3D; it may be RGB, maybe HSV, maybe something else. The data to be clustered is a sample from this space. The two crucial differences between the data we typically see in exploratory data anlaysis and the data used in color quantization are - Color data is discrete; for 24-bit color you only have 256^3 distinct triplets. In image regions of uniform color and intensity you may have multiple instances of the same triplet, and it's probably *not* a good idea to ignore the duplicates; they convey important information about the distribution of color in the image. In many `traditional' clustering algm's, these duplicates might be ignored. - There's a hell of a lot more data than many clustering algorithms are designed to *realistically* handle (e.g., 256K 3-vectors for a 512x512 image). Algorithms that compute a proximity measure for *each* pair of 3-vectors will take forever to run on such a data set. The statistician in me would normally say: `No problem. subsample the image to get, say, 1K 3-vectors, find a 256-cluster solution, and classify the remaining pixels by finding the Euclidean distance in color space between them and the various cluster centers that you obtained.' Unfortunately, subsampling creates the possibility that you'll subsample perceptually-important image structures out of existence. Consider the example of two boring dark orange regions separated by a really nice-looking, bright purple border. If your subsampling scheme doesn't pick up the border pixels, the reduced-pallette result will have a blurred border (or none at all!); perceptually, you'd probably prefer a little less discrimination between fine shades of dark orange in favor of a faithful reproduction of the bright purple. So, if color quantization is to be attacked using cluster analysis, we need to pay careful attention to the special issues presented by discrete, voluminous image data. \end{pedantic} Here's one possible application of clustering, used as a postprocessing step for a poor-quality color quantizer. I probably won't get around to implementing it anytime soon (gotta finish that PhD), but maybe someone with spare time would be interested. Suppose you have a true-color image, and one quick&dirty color quantization routine, which produces only tolerable output. View each of the 256 color bins as a cluster center for a 256-cluster partitioning of the color space occupied by the image. Construct a (stratified?) sample from the original image by picking up a subset of the true-color pixels falling into each bin (so you have a few 3-vectors from each of the bins). Then cluster those babies. See if the resultant clustering is an improvement. If it is, publish it in SIGGRAPH and make big bucks. Aside: it's interesting to examine the almost-inverse relationship between color quantization (pallette reduction for data compression) and pseudocoloring of monochrome or multispectral data (pallete *expansion* for increased perceptual detail). The image processing book by Wayne Niblack has a very nice discussion of color bases and pseudocoloring of Landsat data. I use pseudocoloring to judge the noisiness of the imagery I get from a triangulation-based rangefinder. It's surprising the detail that the eye will extract from color contours that it can't get from gray-scale. Best Pat Flynn ------------------------------------------------------------------------------ Patrick Flynn, CS, Mich. State Univ. -- flynn@cps.msu.edu -- uunet!frith!flynn