Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: comp.compression
Subject: Re: Reply to Dan Bernstein's criticisms of standard.
Message-ID: <28989.Jun2219.07.0391@kramden.acf.nyu.edu>
Date: 22 Jun 91 19:07:03 GMT
References: <873@spam.ua.oz>
Organization: IR
Lines: 73

In article <873@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes:
> I'm not  convinced. I  think that  allowing algorithm  parameters will
> open up a  really disgusting can of  worms. I see no  harm in having a
> library of exact tuned algorithms.

I do.

Consider, for instance, my yabba coder. There's not only the memory
parameter -m, but a blocking test length -z and ``fuzz'' -Z that affect
compression. Twiddling -z and -Z can produce up to 10% improvements on
some types of files; there's no small set of best choices, and forcing
users into fixed values of -z and -Z, let alone -m, would be insane.

> If a user wants to create another  tuning, he can just fiddle with the
> source code  for the algorithm until  he is happy and  then fiddle the
> identity record to create a new  algorithm.

Uh-uh. This contradicts your stated goal of having the identification
numbers mean something for comparison purposes.

If you want to achieve that goal, just insist that everyone name all the
parameters (in my case, -m/-z/-Z) when quoting compression results. (It
gets even hairier when you're measuring speed and memory: for yabba,
you'd have to quote MOD and PTRS as well as the machine type.) Don't
force parameterized coders into a parameterless model.

> (Another  argument is  that many  algorithms can  be implemented  more
> efficiently if their parameters are statically specified).

Hardly.

> The  user  of the  algorithm  doesn't  have  to  deal with  oodles  of
> information to USE  an algorithm. Only if they  want a non-implemented
> tuning.

One advantage of a program-based interface is that the user is given
options. If he doesn't know about the options or doesn't want to deal
with them, he doesn't use them, but they're always there for
sophisticated users. You could mimic this in your library routine by
passing in an array of options, say of { int, int } pairs, with a
method-dependent meaning.

  [ streams versus blocks ]

I think the real argument here is over whether the compressor should
drive the I/O or vice versa. In modem applications, your standard is
fine: the I/O has to drive the compressor. But in most applications it's
much easier to work the other way round.

> |>    DIRECTNESS: A stream interface will force many algorithms to become
> |>    involved in unnecessary buffering.
> |Your block interface forces *all* algorithms to become involved in
> |unnecessary blocking. Again it sounds like you're concentrating on
> |zero-delay modem compression. I don't think compressors should be forced
> |into that model.
> The interface does  not force the algorithms into  blocking, it forces
> the user programs into it.

No, it also forces the algorithms into blocking. Many algorithms don't
want to deal with blocks. They want to deal with streams. Here, you can
notch this up as a separate criticism: You expect algorithms to have
enough special cases that they work well if you give them blocks of
three bytes at a time. But most real LZW-style compressors work horribly
on such short blocks. Most compressors do not support plaintext or
escape codes, and it is unrealistic to expect otherwise.

> I am wary of a stream interface because it is then impossible to place
> a bound on the amount of output one could get from feeding in a single
> byte of  input (during either  compression or decompression).

Again it sounds like you're focusing on modems. Why?

---Dan