Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!bu.edu!transfer!lectroid!mordor!leavitt
From: leavitt@mordor.hw.stratus.com (Will Leavitt)
Newsgroups: comp.compression
Subject: Re: Request for JPEG specification.
Message-ID: <4800@lectroid.sw.stratus.com>
Date: 3 Apr 91 19:55:04 GMT
References: <1991Apr2.185934.6675@agate.berkeley.edu> <12583@pt.cs.cmu.edu>
Sender: usenet@lectroid.sw.stratus.com
Reply-To: leavitt@mordor.sw.stratus.com (Will Leavitt)
Organization: Stratus Computer, Hardware Engineering
Lines: 193

(reprinted from a C-Cube Microsystems information packet)


JPEG ALGORITHM OVERVIEW

Background of the JPEG Algorithm

The obvious advantages of image compression led to the formation of an
international standards group: the Joint Photographic Experts Group
(JPEG).  JPEG's objective has been to create a common CCITT/ISO
standard for continuous tone image compresion for both color and
greyscale images.  With the proposed JPEG standard, a continuous range
of compression can be acheived.  Twenty-four bits per pixel can be
reduced to one, or less, while image quality is maintained.

The emerging standard defined by JPEG is designed to cover a broad
range of applications. The scope and variety of target products
resulted in a three-part JPEG algorithm definition: a baseline system,
an extended system, and a special function, used to provide support
for lossless compression.

The algorithm for the baseline system is a lossy technique, based on
the discrete cosine transform followed by variable word length
encoding.  The baseline system is the mandatory part of the system,
and every JPEG codec must incorporate this function.

The extended system adds features such as more sophisticated coding,
lossless transmission, and progressive buildup.

The compression acheived by lossy methods is much higher than with
lossless techniques.  Image compression factors depend on the image
material and, with lossy coding, can be selected by the user in a
continuous manner.  For CCIR 601 type images, the JPEG algorithm
typically achieves compression ratious of 24:1, yielding picture
quality nearly indistinguishable form the original.  Deeper
compression ratios can be acheived while maintaining good image
quality.


Operation of the JPEG Algorithm

The operation of the JPEG algorithm can be divided into three basic stages:

  1) the removal of the data redundancy by means of the Discrete Cosine
     Transform (DCT).

  2) The quantization of the DCT coefficient using weighting functions
     optimized for the human visual system.

   3) The encoding of the data to minimize the entropy of the quantized
      DCT coefficients.  The entropy encoding is done with a Huffman
      variable-word-length encoder.

Although color conversion is a part of the redundancy removal process,
it is not part of the JPEG algorithm.  It is the goal of JPEG to be
independant of the color space.  JPEG handles colors as separate
components.  Therefore, it can be used to compress data from different
color spaces, such as RGB, YUV and CMYK.

However, the best compression results are achieved if the color
components are independant (noncorrelated), such as YUV, where most of
the information is concentrated in the luminance and less in the
chrominance.  RGB color components can be converted via a linear
transformation into YUV compoents as shown:

   | Y |       | 0.299  0.587  0.114 |   | R |
   | U |   =   |-0.169 -0.332  0.500 | * | G |
   | V |       | 0.500 -0.418 -0.082 |   | B |

Another advantage of using the YUV color space comes from reducing the
spatial resolution of the U and V chrominance components.  Because
chrominance does not need to be specified as frequently as luminence,
every other U element and every other V element can be discarded.  As
a consequence, a data reduction of 3 to 2 is obtained by transforming
RGB into YUV 4:2:2.  The conversion in color space is a first step
toward compressing the image.


Discrete Cosine Transform

For each separate color compoent, the image is broken into 8x8 blocks
that cover the entire image.  These blocks form the input to the DCT.

In the 8x8 blocks, typically the pixel values vary slowly.  Therefore,
the energy is of low spatial frequency.  A transform that can be used
to concentrate the energy into a few coefficients is the
two-dimensional 8x8 DCT.  The transform, studied extensively for image
compression, is extremely efficient for highly correlated data.

Conceptually, a one dimensional DCT can be thought of as taking the
Fourier Transform and retaining only the real (the cosine) part.  The
two dimensional DCT can be obatined by performing a one-dimensional
DCT on the columns and then a one-dimensional DCT on the rows.  The
transformed ouptut from the two-dimensional DCT is ordered such that
the mean value, the DC coeeficient, is in the upper left corner of the
8x8 coefficient block and the higher frequency coefficients progress
by distance from the DC coefficient.  Higher vertical frequencies are
represented by higher row numbers, and higher horozontal frequencies
are represented by higher column numbers.


Quantization

The next step is the quantization of the frequency coefficients.  The
coefficients are quantized to reduce their magnitude and increase the
number of zero-value coefficients.  A uniform quantizer was selected
for the JPEG baseline method.  The step size is varied according to
the coefficient location and tuned for each color component.

The coding model rearrainges the quantized frequency coefficients into
a zigzag pattern, with the lowest frequencioes first and the higher
frequencies last.  The zigzag pattern is used to increase the
run-length of zero coefficients found in the block.  The assumption is
that the lower frequencies tend to have large coefficients and the
high frequencies are, by the nature of most pictures, predominantly
zero.  The first coefficient (0,0) is called the DC coefficient, and
the remaing coefficients are the AC coefficients.  The AC coefficients
are traversed by the zigzag pattern from the (0,1) location to the
(7,7) location.

The DC coefficients of subsequent blocks often vary only lsightly.
Therefore, differences between successive DC coefficients are small.
The coding of the DC coefficient exploints this property through
Differential Pulse Code Modulation (DPCM).  This technique codes the
difference (Delta) between the quantized DC coefficient of the current
block and the DC coefficient of the previous block.  The formula for
the encoding of the DC code is:

   Delta  = DC(0,0)  -  DC(0,0)
        k          k           k-1

The inverse calculation takes place at the decoder.


Zero Run-Length Coding

The quantized AC coefficients usually contain runs of consecutive
zeros.  Therefore, a coding advantage can be obtained by using a
run-length technique, where the upper four bits of the code symbol
indicate the number of consecutive zeros before the next coefficient
and the lower four bits indicate the number of significant bits in the
next coefficient.

Following the code symbol are the significant bits of the coefficient,
the length of which can be determined by the lower four bits of the
code.  The inverse run-length coder translates the input coded stream
into an output array of AC coefficients.  It takes curent code and
appends the number of zeros to the output array corresponding to the
four bits used for the run-length code.  The coefficient placed in the
output array has the number of bits determined by the lower four bits
of the run-length code and a value determined by the number of
trailing bits.


Entropy Encoding

The block codes from the DPCM and run-length models can be further
compressed using entropy encoding.  For the baseline JPEG method, the
Huffman coder is used to reduce entropy.  One reason for using the
Huffman coder is that it is easy to implement by means of a lookup
table in hardware.  To compress data symbols, the Huffman coder
creates shorter codes for frequently ocurring symbols and longer codes
for ocassionally occuring symbols.  Many applications may use
pre-defined Huffman tables.  Therefore, the baseline encoder can
operate as a one pass or two-pass system.  In a one-pass system,
predetermined Huffman tables are used, whereas in the two-pass system,
Huffman tables are created that are specific to the image to be
encoded.

The first step in creating the Huffman codes is to create a table
assigning a frequency count to each symbol.  Symbols with a higher
probability are assigned shorted codes than the less-frequently
occuring symbols.


Summary of Baseline JPEG

The baseline system provides efficient lossy sequential image
compression.  It supports four color components simultaneously, with
a maximum number of eight input bits for each color pixel component.

The basic data entitiy is a block of 8x8 pixels.  However, this bock
can represent a large subsampled image area (for example, sub-sampled
by decimated chrominance signals).  The blocks of the different color
components are sent interleaved, thereby allowing the decoder to
create the decompressed image and translate back to the original color
space on-the-fly.

--
...............................................................................
this content-free posting has been brought to you by:
                                           Will Leavitt     508-490-6231
                                           leavitt@mordor.hw.stratus.com