Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucsd!pacbell.com!pacbell!hoptoad!hsfmsh!daemon From: tnixon@hsfmsh.UUCP (Toby Nixon) Newsgroups: comp.dcom.modems Subject: Re: Data Compression Message-ID: <3505@hsfmsh.UUCP> Date: 12 Jul 90 15:39:24 GMT Sender: daemon@hsfmsh.UUCP Organization: Hayes Microcomputer Products, Inc. Norcross, Georgia Lines: 65 In article <5084@mace.cc.purdue.edu>, Robert S. Unoki writes: - I'm currently considering purchasing a modem with data compression - for my own personal use. However, I am unclear on exactly how the - data compression works. I understand that all compression is done at - the hardware level to effectively increase throughput by factors of - 2:1 (MNP5) or 4:1 (V.42bis). I also realize that modems on both - ends of the connection must share the same compression algorithms. Briefly, MNP5 works by keeping counts of the frequency of occurrence of individual characters in the data stream. A table is kept sorted so that the most frequently-occurring characters appear at the beginning of the table. To transmit a character, you transmit its POSITION in this table, Huffman-coded (the lowest values take four bits to send; the highest values take 12 bits to send). If your data is made up of only characters which appear so frequently that their position can be sent in 4 bits, then you get 2-to-1 compression. In reality, English text compresses to an average of about 1.6-to-1 with MNP5, but when you combine this with the stripping of start and stop bits done by MNP4 (and V.42 LAPM), you can see 2-to-1 throughput (but it's dependent on the data you're sending). V.42bis uses an entirely different technique, commonly known as Lempel-Ziv-Welch. It builds a tree-structured linked list of strings of characters, constantly adding new characters to extend the length of existing strings and "pruning" infrequently-referenced "leaf nodes" to recover places to put them. A string is transmitted by sending the position in the tree of the LAST character in the string; the receiver recovers the data by following the links up the tree to the "root node" (first character). A string can be from 1 to 250 characters in length, and in normal English text, depending on the maximum number of nodes you have storage for, you can get an average string length of somewhere around 4-5 characters, giving 4-to-1 compression when combined with start- and stop-bit stripping (it takes about 12 bits to send the position in the dictionary). I can go into this in more detail if you like. But remember this essential difference: MNP5 takes FIXED-LENGTH objects and sends them using a VARIABLE-LENGTH code; V.42bis takes VARIABLE-LENGTH objects and sends them using a FIXED-LENGTH code. - I intend to purchase a 2400 modem that operates using either of the - above compression schemes. Does this mean that I will basically - have a 4800 baud connection using MNP5 or 9600 baud using V.42bis? - Would these be the settings of my communications software? The actual throughput you see is dependent on the redundancy (compressibility) of the data. The 2-to-1 and 4-to-1 are for English text (like this; lower-case, lots of spaces, fairly normal vocabulary, etc.) If you're sending binary files or previously-compressed data (like news feeds), you won't see that level of compression (if any). But for interactive work, the compression definitely is an advantage. -- Toby ----------------------------------------------------------------------------- Toby Nixon, Principal Engineer Fax: +1-404-441-1213 Telex: 6502670805 Hayes Microcomputer Products Inc. Voice: +1-404-449-8791 CIS: 70271,404 Norcross, Georgia, USA BBS: +1-404-446-6336 MCI: TNIXON Telemail: T.NIXON/HAYES AT&T: !tnixon UUCP: ...!uunet!hayes!tnixon Internet: hayes!tnixon@uunet.uu.net MHS: C=US / AD=ATTMAIL / PN=TOBY_L_NIXON / DD=TNIXON -----------------------------------------------------------------------------