Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!batcomputer!cornell!rochester!pt.cs.cmu.edu!g.gp.cs.cmu.edu!tgl
From: tgl@g.gp.cs.cmu.edu (Tom Lane)
Newsgroups: comp.compression
Subject: Re: Request for JPEG specification.
Message-ID: <12583@pt.cs.cmu.edu>
Date: 2 Apr 91 23:51:36 GMT
References: <1991Apr2.185934.6675@agate.berkeley.edu>
Organization: Carnegie-Mellon University, CS/RI
Lines: 104

In article <1991Apr2.185934.6675@agate.berkeley.edu>, greg@garnet.berkeley.edu (Greg Kuperberg) writes:
> I would like to see a 1 page or shorter specification of JPEG with an
> eye towards a conceptual understanding of the underlying data
> compression technique.

OK, I'll bite...

Some background:

JPEG works on either full-color or gray-scale images; it does not handle
bilevel (black and white) images, at least not efficiently.  It doesn't
handle colormapped images either; you have to pre-expand those into an
unmapped full-color representation.  (Nonetheless, JPEG comes out a lot
smaller than, say, GIF.)

JPEG is a lossy compression method, meaning that it doesn't necessarily
reconstruct the exact pixel-for-pixel input image.  The technique exploits
known properties of the human vision system.  With proper parameter choices,
it is usually hard to tell a decompressed image from the original *by eye*,
but 'diff' will show lots of differences :-).

There are a lot of parameters to the JPEG compression process.  You can
trade off compressed image size against reconstructed image quality over a
*very* wide range by adjusting the parameters.  You can get image quality
ranging from op-art (at 100x smaller than the original image) to quite
indistinguishable from the source (at about 3x smaller).  My experience is
that the threshold of visible difference from the source image is somewhere
around 10x to 20x smaller than the original.  (Note: the reference image
size here is an uncompressed 24-bit-color representation.  For comparison,
GIF is usually about 3x to 4x smaller than this form.  Bear in mind that as
a colormapped representation, GIF also loses data compared to full color RGB.)


OK, now the outline of the compression algorithm:

1. Transform the image into a suitable color space.  This is a no-op for
grayscale, but for color images you generally want to transform RGB into
a luminance/chrominance color space (YUV, YCbCr, etc).  The luminance
component is gray scale and the other two axes are color information.
The reason for doing this is that you can afford to lose a lot more
information in the chrominance components than you can in the luminance
component; the human eye is not as sensitive to high-frequency color info
as it is to high-frequency luminance.  (See any TV system for precedents.)
You don't have to change the color space if you don't want to.  The
remainder of the algorithm works on each color component independently,
and doesn't really care what the component is.  However, to get
optimal results you must choose the compression parameters for a
component with an eye to its interpretation.

2. (Optional) Subsample each component by averaging together groups of
pixels.  A typical choice is to leave the luminance component at full
resolution, but to subsample the color components 2:1.  (You could not get
away with this in an RGB color space...)

3. Group the pixel values for each component into 8x8 blocks.  Transform
each 8x8 block through a discrete cosine transform (DCT); this is a relative
of the Fourier transform and likewise gives a frequency map (with 8x8
components).  Thus you now have numbers representing the average value in
each block and successively higher-frequency changes within the block.  The
motivation for doing this is that you can now throw away high-frequency
information without affecting low-frequency information.  (The DCT transform
itself is reversible.)

4. In each block, divide each of the 64 frequency components by a separate
"quantization coefficient", and round the results to integers.  This is the
fundamental information-losing step.  A Q.C. of 1 loses no information,
larger Q.C.s lose successively more info.  The higher frequencies are
normally reduced more than the lower.  (All 64 Q.C.s are parameters to the
compression process; tuning them for best results is a black art.  It seems
likely that the best values are yet to be discovered.)

5. Encode the reduced coefficients using either Huffman or arithmetic
coding.  The JPEG spec defines a complex multi-state statistical model for
this coding (see my previous post about the difference between modeling and
coding).  Also, a particular variant of arithmetic coding is specified.

6. Tack on appropriate headers, etc, and output the result.  In an
"interchange" JPEG file, all of the compression parameters are included
in the headers so that the decompressor can reverse the process.  For
specialized applications, the spec permits the parameters to be omitted
from the file; this saves several hundred bytes of overhead, but means that
the decompressor must know what parameters the compressor used.


The decompression algorithm reverses this process, and typically adds some
smoothing steps to reduce pixel-to-pixel discontinuities.


As far as I know, there is no detailed specification of JPEG available
on the net.  For more information try:

Gregory Wallace, "Overview of the JPEG (ISO/CCITT) Still Image
Compression Standard", Proceedings of the SPIE Symposium on
Electronic Imaging Science and Technology, Santa Clara, CA, Feb. 1990.
In SPIE Vol 1244, Image Processing Algorithms and Techniques (1990).

``Digital Image Compression Techniques'' by M. Rabbani and P. W. Jones
(Tutorial Text Series, Vol. TT7, SPIE Press, Bellingham, WA, 1991).

I have not seen either of these...

-- 
			tom lane
Internet: tgl@cs.cmu.edu	BITNET: tgl%cs.cmu.edu@cmuccvma