Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!bu.edu!transfer!lectroid!mordor!leavitt From: leavitt@mordor.hw.stratus.com (Will Leavitt) Newsgroups: comp.compression Subject: Re: Request for JPEG specification. Message-ID: <4800@lectroid.sw.stratus.com> Date: 3 Apr 91 19:55:04 GMT References: <1991Apr2.185934.6675@agate.berkeley.edu> <12583@pt.cs.cmu.edu> Sender: usenet@lectroid.sw.stratus.com Reply-To: leavitt@mordor.sw.stratus.com (Will Leavitt) Organization: Stratus Computer, Hardware Engineering Lines: 193 (reprinted from a C-Cube Microsystems information packet) JPEG ALGORITHM OVERVIEW Background of the JPEG Algorithm The obvious advantages of image compression led to the formation of an international standards group: the Joint Photographic Experts Group (JPEG). JPEG's objective has been to create a common CCITT/ISO standard for continuous tone image compresion for both color and greyscale images. With the proposed JPEG standard, a continuous range of compression can be acheived. Twenty-four bits per pixel can be reduced to one, or less, while image quality is maintained. The emerging standard defined by JPEG is designed to cover a broad range of applications. The scope and variety of target products resulted in a three-part JPEG algorithm definition: a baseline system, an extended system, and a special function, used to provide support for lossless compression. The algorithm for the baseline system is a lossy technique, based on the discrete cosine transform followed by variable word length encoding. The baseline system is the mandatory part of the system, and every JPEG codec must incorporate this function. The extended system adds features such as more sophisticated coding, lossless transmission, and progressive buildup. The compression acheived by lossy methods is much higher than with lossless techniques. Image compression factors depend on the image material and, with lossy coding, can be selected by the user in a continuous manner. For CCIR 601 type images, the JPEG algorithm typically achieves compression ratious of 24:1, yielding picture quality nearly indistinguishable form the original. Deeper compression ratios can be acheived while maintaining good image quality. Operation of the JPEG Algorithm The operation of the JPEG algorithm can be divided into three basic stages: 1) the removal of the data redundancy by means of the Discrete Cosine Transform (DCT). 2) The quantization of the DCT coefficient using weighting functions optimized for the human visual system. 3) The encoding of the data to minimize the entropy of the quantized DCT coefficients. The entropy encoding is done with a Huffman variable-word-length encoder. Although color conversion is a part of the redundancy removal process, it is not part of the JPEG algorithm. It is the goal of JPEG to be independant of the color space. JPEG handles colors as separate components. Therefore, it can be used to compress data from different color spaces, such as RGB, YUV and CMYK. However, the best compression results are achieved if the color components are independant (noncorrelated), such as YUV, where most of the information is concentrated in the luminance and less in the chrominance. RGB color components can be converted via a linear transformation into YUV compoents as shown: | Y | | 0.299 0.587 0.114 | | R | | U | = |-0.169 -0.332 0.500 | * | G | | V | | 0.500 -0.418 -0.082 | | B | Another advantage of using the YUV color space comes from reducing the spatial resolution of the U and V chrominance components. Because chrominance does not need to be specified as frequently as luminence, every other U element and every other V element can be discarded. As a consequence, a data reduction of 3 to 2 is obtained by transforming RGB into YUV 4:2:2. The conversion in color space is a first step toward compressing the image. Discrete Cosine Transform For each separate color compoent, the image is broken into 8x8 blocks that cover the entire image. These blocks form the input to the DCT. In the 8x8 blocks, typically the pixel values vary slowly. Therefore, the energy is of low spatial frequency. A transform that can be used to concentrate the energy into a few coefficients is the two-dimensional 8x8 DCT. The transform, studied extensively for image compression, is extremely efficient for highly correlated data. Conceptually, a one dimensional DCT can be thought of as taking the Fourier Transform and retaining only the real (the cosine) part. The two dimensional DCT can be obatined by performing a one-dimensional DCT on the columns and then a one-dimensional DCT on the rows. The transformed ouptut from the two-dimensional DCT is ordered such that the mean value, the DC coeeficient, is in the upper left corner of the 8x8 coefficient block and the higher frequency coefficients progress by distance from the DC coefficient. Higher vertical frequencies are represented by higher row numbers, and higher horozontal frequencies are represented by higher column numbers. Quantization The next step is the quantization of the frequency coefficients. The coefficients are quantized to reduce their magnitude and increase the number of zero-value coefficients. A uniform quantizer was selected for the JPEG baseline method. The step size is varied according to the coefficient location and tuned for each color component. The coding model rearrainges the quantized frequency coefficients into a zigzag pattern, with the lowest frequencioes first and the higher frequencies last. The zigzag pattern is used to increase the run-length of zero coefficients found in the block. The assumption is that the lower frequencies tend to have large coefficients and the high frequencies are, by the nature of most pictures, predominantly zero. The first coefficient (0,0) is called the DC coefficient, and the remaing coefficients are the AC coefficients. The AC coefficients are traversed by the zigzag pattern from the (0,1) location to the (7,7) location. The DC coefficients of subsequent blocks often vary only lsightly. Therefore, differences between successive DC coefficients are small. The coding of the DC coefficient exploints this property through Differential Pulse Code Modulation (DPCM). This technique codes the difference (Delta) between the quantized DC coefficient of the current block and the DC coefficient of the previous block. The formula for the encoding of the DC code is: Delta = DC(0,0) - DC(0,0) k k k-1 The inverse calculation takes place at the decoder. Zero Run-Length Coding The quantized AC coefficients usually contain runs of consecutive zeros. Therefore, a coding advantage can be obtained by using a run-length technique, where the upper four bits of the code symbol indicate the number of consecutive zeros before the next coefficient and the lower four bits indicate the number of significant bits in the next coefficient. Following the code symbol are the significant bits of the coefficient, the length of which can be determined by the lower four bits of the code. The inverse run-length coder translates the input coded stream into an output array of AC coefficients. It takes curent code and appends the number of zeros to the output array corresponding to the four bits used for the run-length code. The coefficient placed in the output array has the number of bits determined by the lower four bits of the run-length code and a value determined by the number of trailing bits. Entropy Encoding The block codes from the DPCM and run-length models can be further compressed using entropy encoding. For the baseline JPEG method, the Huffman coder is used to reduce entropy. One reason for using the Huffman coder is that it is easy to implement by means of a lookup table in hardware. To compress data symbols, the Huffman coder creates shorter codes for frequently ocurring symbols and longer codes for ocassionally occuring symbols. Many applications may use pre-defined Huffman tables. Therefore, the baseline encoder can operate as a one pass or two-pass system. In a one-pass system, predetermined Huffman tables are used, whereas in the two-pass system, Huffman tables are created that are specific to the image to be encoded. The first step in creating the Huffman codes is to create a table assigning a frequency count to each symbol. Symbols with a higher probability are assigned shorted codes than the less-frequently occuring symbols. Summary of Baseline JPEG The baseline system provides efficient lossy sequential image compression. It supports four color components simultaneously, with a maximum number of eight input bits for each color pixel component. The basic data entitiy is a block of 8x8 pixels. However, this bock can represent a large subsampled image area (for example, sub-sampled by decimated chrominance signals). The blocks of the different color components are sent interleaved, thereby allowing the decoder to create the decompressed image and translate back to the original color space on-the-fly. -- ............................................................................... this content-free posting has been brought to you by: Will Leavitt 508-490-6231 leavitt@mordor.hw.stratus.com