Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!tank!eecae!cps3xx!flynn
From: flynn@pixel.cps.msu.edu (Patrick J. Flynn)
Newsgroups: comp.graphics
Subject: Re: How to map 24-bit RGB to best EGA/VGA palette?
Keywords: RGB EGA VGA color
Message-ID: <4503@cps3xx.UUCP>
Date: 7 Sep 89 02:47:47 GMT
References: <4379@cps3xx.UUCP> <126@vsserv.scri.fsu.edu> <3129@cbnewsm.ATT.COM> <7743@cbmvax.UUCP> <13319@well.UUCP> <586@celit.com> <4862@eos.UUCP> <13381@well.UUCP> <9464@venera.isi.edu> <9473@venera.isi.edu> <71733@yale-celray.yale.UUCP>
Sender: usenet@cps3xx.UUCP
Organization: Pixel Abusers
Lines: 102

In article <71733@yale-celray.yale.UUCP> craig@weedeater.math.yale.edu (Craig Kolb) writes:
>In article <9473@venera.isi.edu> raveling@isi.edu (Paul Raveling) writes:
>> [In the >>> article, Pat Flynn wrote:]
>>>  If so, one *could* attempt to reduce the palette
>>> through the application of a traditional clustering algorithm (there
>>> was an article in ACM TOMS on this last year; ...
>>
>>  [...]
>>	It might be possible that a Monte Carlo color selection algorithm
>>	would work well for some images.  I may try that as an experiment
>>	too.
>>
>>> ... and I
>>> seriously doubt that the `clusters' in color space (if any) are
>>> (say) multivariate Gaussian; I also doubt that people are willing to wait
>>> hours (!) for a clustering algorithm to reduce their palette.
>>
>>	Caramba!  Maybe I'll only skim that article.
>
>Unless I'm mistaken, the article in question is:
>
>Wan, Wong, and Prusinkiewicz, An Algorithm for Multidimensional
>  Data Clustering, Transactions on Mathematical Software,
>  Vol 14, #2 (June '88), pp. 153-162

That's the one I was referring to.  Next time I follow up, I'll do it
from my office (where the journals are), not from home :-) .

> [...]
>As far as speed is concerned, quantizing a 512x512x24-bit version of "lenna"
>to 256 colors on my Iris 4D60 takes approximately 5 seconds, which is
>typical for most images of similar size.

Good to see some actual numbers.  I should emphasize that taking a clustering
approach to color quantization does not *guarantee* that your results will take
a long time to come.  Taking a *naive* approach to clustering often may.

\begin{pedantic}
Most clustering algorithms base membership decisions on a proximity measure
defined on the `feature space' of interest.  Here, our feature space is 3D;
it may be RGB, maybe HSV, maybe something else.  The data to be clustered
is a sample from this space.  The two crucial differences between the data
we typically see in exploratory data anlaysis and the data  used in color
quantization are

 - Color data is discrete; for 24-bit color you only have 256^3 distinct
   triplets. In image regions of uniform color and intensity you may have
   multiple instances of the same triplet, and it's probably *not* a good
   idea to ignore the duplicates; they convey important information about
   the distribution of color in the image.  In many `traditional' clustering
   algm's, these duplicates might be ignored.

 - There's a hell of a lot more data than many clustering algorithms are
   designed to *realistically* handle (e.g., 256K 3-vectors for a 512x512
   image).  Algorithms that compute a
   proximity measure for *each* pair of 3-vectors will take forever to run on
   such a data set.  The statistician in me would normally say: `No problem.
   subsample the image to get, say, 1K 3-vectors, find a 256-cluster
   solution, and classify the remaining pixels by finding the Euclidean
   distance in color space between them and the various cluster centers that
   you obtained.' Unfortunately, subsampling creates the possibility that you'll
   subsample perceptually-important image structures out of existence.  Consider
   the example of two boring dark orange regions separated by a really
   nice-looking, bright purple border. If your subsampling scheme doesn't
   pick up the border pixels,  the reduced-pallette result will have a
   blurred border (or none at all!); perceptually, you'd probably prefer
   a little less discrimination between fine shades of dark orange in favor
   of a faithful reproduction of the bright purple.

So, if color quantization is to be attacked using cluster analysis, we need
to pay careful attention to the special issues presented by discrete,
voluminous image data.
\end{pedantic}

Here's one possible application of clustering, used as a postprocessing
step for a poor-quality color quantizer.  I probably won't get around to
implementing it anytime soon (gotta finish that PhD), but maybe someone
with spare time would be interested.

Suppose you have a true-color image, and one quick&dirty color
quantization routine, which produces only tolerable output.  View each
of the 256 color bins as a cluster center for a 256-cluster
partitioning of the color space occupied by the image.  Construct a
(stratified?) sample from the original image by picking up a subset of
the true-color pixels falling into each bin (so you have a few
3-vectors from each of the bins). Then cluster those babies.  See if
the resultant clustering is an improvement.  If it is, publish it in
SIGGRAPH and make big bucks.

Aside: it's interesting to examine the almost-inverse relationship between
color quantization (pallette reduction for data compression) and pseudocoloring
of monochrome or multispectral data (pallete *expansion* for increased
perceptual detail).  The image processing book by Wayne Niblack has a
very nice discussion of color bases and pseudocoloring of Landsat data.
I use pseudocoloring to judge the noisiness of the imagery I get from
a triangulation-based rangefinder.  It's surprising the detail that the
eye will extract from color contours that it can't get from gray-scale.

Best
Pat Flynn
------------------------------------------------------------------------------
Patrick Flynn, CS, Mich. State Univ. -- flynn@cps.msu.edu -- uunet!frith!flynn