Path: utzoo!attcan!uunet!ginosko!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!bloom-beacon!atrp.mit.edu!ashok
From: ashok@atrp.mit.edu (Ashok C. Popat)
Newsgroups: comp.graphics
Subject: Re: SigGraph Fractal Compression
Message-ID: <13785@bloom-beacon.MIT.EDU>
Date: 24 Aug 89 11:53:45 GMT
References: <1934@uceng.UC.EDU> <127@bambam.UUCP>
Sender: daemon@bloom-beacon.MIT.EDU
Reply-To: ashok@atrp.mit.edu (Ashok C. Popat)
Organization: MIT
Lines: 105

In article <127@bambam.UUCP> bpendlet@bambam.UUCP (Bob Pendleton) writes:
>
>DCT? Could some kind person post a definition of DCT? A brief
>description of the basic principles would be nice as would a pointer
>to a survey article on the subject. 

Transform coding is discussed at length in:

N.S. Jayant and P. Noll, _Digital Coding of Waveforms: Principles and
Applications to Speech and Video_, Prentice-Hall, Englewood Cliffs, N.J.,
1984, Chapter 12.

The seminal paper on the DCT is:

N. Ahmed, T. Natarajan and K.R. Rao, "Discrete Cosine Transform,"
IEEE Trans. on Computers, pp. 90-93, Jan. 1974.

The following two papers are on related topics:

H.S.  Malvar and D.S. Staelin, "Reduction of Blocking Effects in
Image Coding with a Lapped Orthogonal Transform," Proceedings from
International Conf. on Acoustics, Speech, and Signal Processing,
New York, April 11-14, 1988.  Also see related article by the same
authors in the Feb '89 IEEE Trans. on Commun.

J.W. Woods, S. O'Neil, "Subband Coding of Images," IEEE Trans. Speech,
Acoust., Signal Processing, Oct. 1986, pp. 1278-1288.

Now here's my attempt to explain the DCT without going in to much math:

DECORRELATION AND ENERGY-COMPACTION:

Suppose we have an image that consists of 512 by 512 pixels, where
each pixel takes on a continuous real value.  Our goal is to digitize
the image efficiently --- i.e., represent it using not more than X
bits with as great fidelity as possible.

Think of our image as being comprised of contiguous,
mutually-exclusive tiles or blocks, each block consisting of, say, 8
by 8 pixels.  Hence our 512 by 512 image contains 4096 such blocks.
Each block can be regarded as a vector in 64-dimensional space.  Now
if we were to do a statistical analysis of the components of the
blocks, we would find that there is some correlation between
components, and that all components have about the same variance.  In
order to digitize the components efficiently, we'd have to use a
relatively complicated scheme that takes advantage of the correlation
between components in each block, and we'd probably want to use the
same quantization scheme for all of the components in a block.

An alternative is to apply an energy-preserving transformation that
gets rid of the correlation between components prior to digitization.
A necessary consequence of the energy-preserving decorrelation of
components is the "compaction" of energy --- the components no longer
have the same variance, although the *sum* of the variances is the
same as before.  In fancy language we are saying that the
transformation is unitary and that it diagonalizes the autocovariance
matrix.

The fact that the components no longer have the same variance allows a
greater number of quantization bits to be allocated to the components
that "need" them most, or alternatively, it allows the "unimportant"
components to be simply discarded. And since the components are now
uncorrelated, little would be gained by using anything other than
scalar (Lloyd-Max) quantization to accomplish the digitizing.  This is
the basic idea behind transform coding.

THE TRANSFORMATION:

If we assume that the image is well-modeled as a wide-sense stationary
random process (often assumed for simplicity in analysis), then the
transformation that achieves perfect decorrelation is the
Karhounen-Loeve transform (KLT).  The columns of the forward KLT
matrix are simply the eigenvectors of the autocovariance matrix for
the blocks.

The KLT is, in general, difficult to compute.  This is where the DCT
comes in.  The rows of the DCT matrix are simply sampled cosines, and
hence it is easy to compute.  The DCT has been found to give very good
energy-compaction in image coding.  Also, when a first-order
Gauss-Markov model is used to describe images (where only
adjacent-pixel correlation is accounted for), the performance of the
DCT approaches that of the KLT as the correlation approaches unity.

THE LOT AND QMFs:

For a given number of transform coefficients, greater
energy-compaction can be achieved by allowing the transformation
domain to extend beyond the block boundaries. Also, the problem of
"block artifacts" (the visibility of the block boundaries resulting
from abrupt changes in the way the coefficients are quantized from
block to block) is mitigated in this way.  This leads to the
Lapped-orthogonal transform (LOT) and quadrature-mirror filter (QMF)
banks.  When the latter is employed, the term "transform coding" is no
longer used; rather, "subband coding" is used.  QMFs go back to 1977
and LOTs to 1984, but their application to image coding is only
beginning to become widespread.

There are a number of variables in a DCT coder, including the size of
the blocks, whether the coder is adaptive, the manner in which the
coefficients are quantized, and whether the rate is fixed or variable.
Also, it should be mentioned that there are two varieties of DCT,
even-symmetrical and odd-symmetrical, but the former is nearly always
used.

Ashok Chhabedia Popat   MIT Rm 36-665 (617) 253-7302