Xref: utzoo comp.sys.amiga.datacomm:141 alt.flame:27705
Path: utzoo!utgpu!cs.utexas.edu!sun-barr!ames!vsi1!zorch!xanthian
From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan)
Newsgroups: comp.sys.amiga.datacomm,alt.flame
Subject: Re: A more memory efficient compress
Message-ID: <1991Jan28.190108.13993@zorch.SF-Bay.ORG>
Date: 28 Jan 91 19:01:08 GMT
References: <20680@know.pws.bull.com>
Followup-To: comp.sys.amiga.datacomm
Organization: SF-Bay Public-Access Unix
Lines: 62


 C506634@UMCVMB.MISSOURI.EDU (Eric Edwards) writes:

> Does such a beast exist? The current version of compress (4.0) still
> wants a big chunk of contiguous ram. On my system, the largest block
> available after I boot up and start a shell is 372k. This is still not
> enough!

> Surely if lharc and Zip can run under such conditions under the same
> conditions using the same compression algorithm, compress ought to be
> able to. Actually, I wouldn't really mind if compress took 550k to
> run, just so long as it doesn't have to be contiguous.

> So. Any pointers?

Well, I used to compress files under Unix and uncompress them on my 512K
A1000 all the time, the secret is the "-b" flag.  When compressing _for_
a small memory system, or when compressing _on_ a small memory system,
use "-b14", "-b13", or even "-b12", until you get a size that works.

I'm a bit fuzzy on this, but I think the ## in "-b##" is the power of
two size of the look-back buffer that compress uses to find strings it
already knows that it can point to instead of copying to the output.

Obviously, the bigger the look-back buffer, the better chance of finding
a really long string match, and so the better the potential compression.
As a result, designing a _big_ buffer into compress is a Good Thing.

However, it turns out that even "-b12" is pretty efficient compared to
the default "-b16" on a Unix system or "-b14" in the Amiga implementation
I use.

If you get a file compressed by someone _else_, your best bet is to do
an uncompress, compress -b14 on your host site before transferring the
data down, just to be on the safe side.

By the way, you've mostly identified the problem you're having: the
Amiga memory is "hunky" from memory manager fragmentation, while a Unix
process gets its own clean 16M of virtual memory in which to allocate
it's work buffers; naturally compress, being a Unix utility designed
for speed, doesn't take into account that the look-back buffer might
need to be allocated as a link list of contiguous parts, slowing down
access and compression speed a lot.  Better to use the "-b12" flag
than to rewrite compress to run more slowly.

Oh, yeah, the "-b" flag isn't needed to uncompress the data, the buffer
size is a header element in the compressed data file.

As to lharc and zip, I don't know whether they inherently use smaller
buffers than the compress default (probably, though, since both had
their origins within MS-DOS's 640K address space), though obviously they
have to use buffers at least as big as the ones on the system that
created the archive you are unpacking. It is alternately possible (much
less likely) that they know how to do "hunky" buffers.

In general, your "using the same algorithm" ignores the fact that the
compress algorithm has a scaling factor controlled by the "-b" flag,
and so is really a family of algorithms with different buffer needs,
and that's where the magic is that makes things work.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>