Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!batcomputer!cornell!rochester!pt.cs.cmu.edu!g.gp.cs.cmu.edu!tgl From: tgl@g.gp.cs.cmu.edu (Tom Lane) Newsgroups: comp.compression Subject: Re: Request for JPEG specification. Message-ID: <12583@pt.cs.cmu.edu> Date: 2 Apr 91 23:51:36 GMT References: <1991Apr2.185934.6675@agate.berkeley.edu> Organization: Carnegie-Mellon University, CS/RI Lines: 104 In article <1991Apr2.185934.6675@agate.berkeley.edu>, greg@garnet.berkeley.edu (Greg Kuperberg) writes: > I would like to see a 1 page or shorter specification of JPEG with an > eye towards a conceptual understanding of the underlying data > compression technique. OK, I'll bite... Some background: JPEG works on either full-color or gray-scale images; it does not handle bilevel (black and white) images, at least not efficiently. It doesn't handle colormapped images either; you have to pre-expand those into an unmapped full-color representation. (Nonetheless, JPEG comes out a lot smaller than, say, GIF.) JPEG is a lossy compression method, meaning that it doesn't necessarily reconstruct the exact pixel-for-pixel input image. The technique exploits known properties of the human vision system. With proper parameter choices, it is usually hard to tell a decompressed image from the original *by eye*, but 'diff' will show lots of differences :-). There are a lot of parameters to the JPEG compression process. You can trade off compressed image size against reconstructed image quality over a *very* wide range by adjusting the parameters. You can get image quality ranging from op-art (at 100x smaller than the original image) to quite indistinguishable from the source (at about 3x smaller). My experience is that the threshold of visible difference from the source image is somewhere around 10x to 20x smaller than the original. (Note: the reference image size here is an uncompressed 24-bit-color representation. For comparison, GIF is usually about 3x to 4x smaller than this form. Bear in mind that as a colormapped representation, GIF also loses data compared to full color RGB.) OK, now the outline of the compression algorithm: 1. Transform the image into a suitable color space. This is a no-op for grayscale, but for color images you generally want to transform RGB into a luminance/chrominance color space (YUV, YCbCr, etc). The luminance component is gray scale and the other two axes are color information. The reason for doing this is that you can afford to lose a lot more information in the chrominance components than you can in the luminance component; the human eye is not as sensitive to high-frequency color info as it is to high-frequency luminance. (See any TV system for precedents.) You don't have to change the color space if you don't want to. The remainder of the algorithm works on each color component independently, and doesn't really care what the component is. However, to get optimal results you must choose the compression parameters for a component with an eye to its interpretation. 2. (Optional) Subsample each component by averaging together groups of pixels. A typical choice is to leave the luminance component at full resolution, but to subsample the color components 2:1. (You could not get away with this in an RGB color space...) 3. Group the pixel values for each component into 8x8 blocks. Transform each 8x8 block through a discrete cosine transform (DCT); this is a relative of the Fourier transform and likewise gives a frequency map (with 8x8 components). Thus you now have numbers representing the average value in each block and successively higher-frequency changes within the block. The motivation for doing this is that you can now throw away high-frequency information without affecting low-frequency information. (The DCT transform itself is reversible.) 4. In each block, divide each of the 64 frequency components by a separate "quantization coefficient", and round the results to integers. This is the fundamental information-losing step. A Q.C. of 1 loses no information, larger Q.C.s lose successively more info. The higher frequencies are normally reduced more than the lower. (All 64 Q.C.s are parameters to the compression process; tuning them for best results is a black art. It seems likely that the best values are yet to be discovered.) 5. Encode the reduced coefficients using either Huffman or arithmetic coding. The JPEG spec defines a complex multi-state statistical model for this coding (see my previous post about the difference between modeling and coding). Also, a particular variant of arithmetic coding is specified. 6. Tack on appropriate headers, etc, and output the result. In an "interchange" JPEG file, all of the compression parameters are included in the headers so that the decompressor can reverse the process. For specialized applications, the spec permits the parameters to be omitted from the file; this saves several hundred bytes of overhead, but means that the decompressor must know what parameters the compressor used. The decompression algorithm reverses this process, and typically adds some smoothing steps to reduce pixel-to-pixel discontinuities. As far as I know, there is no detailed specification of JPEG available on the net. For more information try: Gregory Wallace, "Overview of the JPEG (ISO/CCITT) Still Image Compression Standard", Proceedings of the SPIE Symposium on Electronic Imaging Science and Technology, Santa Clara, CA, Feb. 1990. In SPIE Vol 1244, Image Processing Algorithms and Techniques (1990). ``Digital Image Compression Techniques'' by M. Rabbani and P. W. Jones (Tutorial Text Series, Vol. TT7, SPIE Press, Bellingham, WA, 1991). I have not seen either of these... -- tom lane Internet: tgl@cs.cmu.edu BITNET: tgl%cs.cmu.edu@cmuccvma