Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site ames.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!houxm!houxz!vax135!cornell!uw-beaver!tektronix!hplabs!ames!jaw
From: jaw@ames.UUCP (James A. Woods)
Newsgroups: net.unix-wizards,net.research
Subject: Lempel-Ziv Compression Made Swift (method of J. A. Woods), part 1
Message-ID: <425@ames.UUCP>
Date: Thu, 26-Jul-84 00:39:02 EDT
Article-I.D.: ames.425
Posted: Thu Jul 26 00:39:02 1984
Date-Received: Sat, 21-Jul-84 03:39:50 EDT
Organization: NASA-Ames Research Center, Mtn. View, CA
Lines: 109

#  "It's fresh squozen daily."

	-- official slogan of the Ames fast file finder
           (yet another data compression technique)

     Herein we report on a simple and fleet implementation of the LZW algorithm
now making the rounds on UNIX.  For equivalent compression rates (2:1 for
unix-wizards, 3:1 and more for numerical data), it runs at least 1.6 times
faster for large files than the code offered by Spencer Thomas.  This
is a C language comparison only; a greater speed differential is expected
if both versions were written in machine language).  This method may be
impractical for output codes of length greater than 13 bits, though,
fortunately, the basic algorithm offers only diminishing returns beyond this.

     Like Mr. Thomas, I adapted the Welch implementation to a different
data structure, believing that the hashing method reported in the June IEEE
Computer to be "too slow".  Essentially, the code below is a direct-access
lookup of the alternate string table pictured in the article.  There are
2 ** BITS codes which serve as prefixes to be associated with any of 256 ascii
characters.  For the typical BITS == 12, this looks unwieldy, but is actually
workable on a 32-bit processor with the enhancements I describe.  A first
pass took the form:

	short table [(1 << BITS) * 256];   /* 2 megabytes for BITS = 12 */

	squash ( )
	{
		register int K, i, p, prefix;
		register long code;

		bzero ( table, 1 << BITS );	/* clear table, but see below */

		for ( i = 0; i < 256; i++ ) 
			table [i] = i; 
	
		prefix = getchar ( );

		while ( (K = getchar ( )) != EOF ) {
			code = (long) ((K << BITS) + prefix);
			/*
		   	    test for code in "string" table
			*/
			if ( (p = table [code]) != 0 )
				prefix = p;
			else {			/* put code in table */
				output ( prefix );
				prefix = K;
				if ( i < (1 << BITS) )    /* ran out of codes */
					table [code] = (short) i++;
			}
		}
		output ( prefix );		/* last one */
	}

The table initialization time (Is bcopy() the fastest way to do this in C?)
is disconcerting, some 6 seconds of system time (2 MB) on our 11/750.  This is
unacceptable for small files.  It turns out that, using a variant of the
sparse array initialization trick discussed by J. Bentley (CACM, 8/83), we
can associate a bit-vector with the large code table, and save time by
setting a bit in this map before each read access to the table, and by
testing the bit before each write.

     The macros provided in M. D. McIlroy's now defunct V7 "spell" code,
can be used for these bit tests:

	#define SHIFT	5
	#define get(h)	(bits[h>>SHIFT]&(1<<((long)h&((1<<SHIFT)-1))))
	#define set(h)	bits[h>>SHIFT] |= 1<<((long)h&((1<<SHIFT)-1))

How's that for spiff?  It compiles into just a few VAX instructions and
beats the mod operator hands down.  If you give up, look at p. 47 in K & R.
Though probabilistic Bloom filtering for spelling error detection fades
away, the bit macros live on!  Integrating this in the above fragment is 
omitted here for clarity.

     Another subtle point:  originally the indexing

	code = (long) ((K << BITS) + prefix);

was, keeping in spirit with Welch's alternate string table exposition:

	code = (long) ((prefix << BITS) + K);

But we discovered that (on a virtual memory machine at least), the first
form easily halves (!) system and user time (and space) for text files,
since array accesses, in the absence of parity marked characters, neatly
clusters things in the top half (one meg here) of the array.

     Just for a lark, I also wired in special-case code to keep
prefix / character 040 (space) pairs in their own fast storage, since
most net English [sic] (and also numerical tables) are 15% space.  With
these modifications and using Thomas's exact output() routine but with a doubled
fwrite() buffer size, the following obtains:

			Input:   200 KB unix-wizards
			Machine: VAX 11/750, BSD 4.2

	program	             user    system   % compression
                             time     time

	pack                 15.2      4.4        34.0	      Sys V.1 port
        compress (16 bit)    45.2     13.2        59.6
	compact             221.2      4.9        34.0
	compress (12 bit)    35.0      1.4        43.1        assembler output()
	compress (13 bit)    35.7      1.8        48.3		  function   
	squash   (12 bit)    18.4      2.9        43.1                "
	squash   (13 bit)    19.1      4.5        48.3	              "

Comments, largely of a philosophical nature, are continued in part 2 ...