Path: utzoo!attcan!uunet!ginosko!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!bloom-beacon!atrp.mit.edu!ashok From: ashok@atrp.mit.edu (Ashok C. Popat) Newsgroups: comp.graphics Subject: Re: SigGraph Fractal Compression Message-ID: <13785@bloom-beacon.MIT.EDU> Date: 24 Aug 89 11:53:45 GMT References: <1934@uceng.UC.EDU> <127@bambam.UUCP> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: ashok@atrp.mit.edu (Ashok C. Popat) Organization: MIT Lines: 105 In article <127@bambam.UUCP> bpendlet@bambam.UUCP (Bob Pendleton) writes: > >DCT? Could some kind person post a definition of DCT? A brief >description of the basic principles would be nice as would a pointer >to a survey article on the subject. Transform coding is discussed at length in: N.S. Jayant and P. Noll, _Digital Coding of Waveforms: Principles and Applications to Speech and Video_, Prentice-Hall, Englewood Cliffs, N.J., 1984, Chapter 12. The seminal paper on the DCT is: N. Ahmed, T. Natarajan and K.R. Rao, "Discrete Cosine Transform," IEEE Trans. on Computers, pp. 90-93, Jan. 1974. The following two papers are on related topics: H.S. Malvar and D.S. Staelin, "Reduction of Blocking Effects in Image Coding with a Lapped Orthogonal Transform," Proceedings from International Conf. on Acoustics, Speech, and Signal Processing, New York, April 11-14, 1988. Also see related article by the same authors in the Feb '89 IEEE Trans. on Commun. J.W. Woods, S. O'Neil, "Subband Coding of Images," IEEE Trans. Speech, Acoust., Signal Processing, Oct. 1986, pp. 1278-1288. Now here's my attempt to explain the DCT without going in to much math: DECORRELATION AND ENERGY-COMPACTION: Suppose we have an image that consists of 512 by 512 pixels, where each pixel takes on a continuous real value. Our goal is to digitize the image efficiently --- i.e., represent it using not more than X bits with as great fidelity as possible. Think of our image as being comprised of contiguous, mutually-exclusive tiles or blocks, each block consisting of, say, 8 by 8 pixels. Hence our 512 by 512 image contains 4096 such blocks. Each block can be regarded as a vector in 64-dimensional space. Now if we were to do a statistical analysis of the components of the blocks, we would find that there is some correlation between components, and that all components have about the same variance. In order to digitize the components efficiently, we'd have to use a relatively complicated scheme that takes advantage of the correlation between components in each block, and we'd probably want to use the same quantization scheme for all of the components in a block. An alternative is to apply an energy-preserving transformation that gets rid of the correlation between components prior to digitization. A necessary consequence of the energy-preserving decorrelation of components is the "compaction" of energy --- the components no longer have the same variance, although the *sum* of the variances is the same as before. In fancy language we are saying that the transformation is unitary and that it diagonalizes the autocovariance matrix. The fact that the components no longer have the same variance allows a greater number of quantization bits to be allocated to the components that "need" them most, or alternatively, it allows the "unimportant" components to be simply discarded. And since the components are now uncorrelated, little would be gained by using anything other than scalar (Lloyd-Max) quantization to accomplish the digitizing. This is the basic idea behind transform coding. THE TRANSFORMATION: If we assume that the image is well-modeled as a wide-sense stationary random process (often assumed for simplicity in analysis), then the transformation that achieves perfect decorrelation is the Karhounen-Loeve transform (KLT). The columns of the forward KLT matrix are simply the eigenvectors of the autocovariance matrix for the blocks. The KLT is, in general, difficult to compute. This is where the DCT comes in. The rows of the DCT matrix are simply sampled cosines, and hence it is easy to compute. The DCT has been found to give very good energy-compaction in image coding. Also, when a first-order Gauss-Markov model is used to describe images (where only adjacent-pixel correlation is accounted for), the performance of the DCT approaches that of the KLT as the correlation approaches unity. THE LOT AND QMFs: For a given number of transform coefficients, greater energy-compaction can be achieved by allowing the transformation domain to extend beyond the block boundaries. Also, the problem of "block artifacts" (the visibility of the block boundaries resulting from abrupt changes in the way the coefficients are quantized from block to block) is mitigated in this way. This leads to the Lapped-orthogonal transform (LOT) and quadrature-mirror filter (QMF) banks. When the latter is employed, the term "transform coding" is no longer used; rather, "subband coding" is used. QMFs go back to 1977 and LOTs to 1984, but their application to image coding is only beginning to become widespread. There are a number of variables in a DCT coder, including the size of the blocks, whether the coder is adaptive, the manner in which the coefficients are quantized, and whether the rate is fixed or variable. Also, it should be mentioned that there are two varieties of DCT, even-symmetrical and odd-symmetrical, but the former is nearly always used. Ashok Chhabedia Popat MIT Rm 36-665 (617) 253-7302