Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!spam!ross
From: ross@spam.ua.oz.au (Ross Williams)
Newsgroups: comp.compression
Subject: Reply to Dan Bernstein's criticisms of standard.
Summary: Reply to Dan Bernstein's criticisms of standard.
Keywords: proposed data compression interface standard criticism reply
Message-ID: <873@spam.ua.oz>
Date: 22 Jun 91 05:09:23 GMT
Sender: ross@spam.ua.oz
Followup-To: comp.compression
Organization: Statistics, Pure & Applied Mathematics, University of Adelaide
Lines: 101

CRIT_006: MORE ON BLOCKING
--------------------------
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Date: 21 Jun 91 20:43:38 GMT

|| To resolve  this mess, the  standard prohibits parameters.
|| The standard also forces the algorithm designer to specify how much memory
|| the algorithm will require.
|
|I find this completely unrealistic.
|
|Ross, it sounds like you're trying to specify a standard for modem
|(i.e., zero-delay) compression. But most compression in the real world
|is high-delay. In fact, someone compressing data for an archive, or for
|news batching, is much better off with control over the parameters. That
|way he can adjust them to fit his needs without swapping in an entirely
|different ``standard'' routine. He doesn't want to make the blocking
|explicit, or to deal with the oodles of information you've defined; he
|just wants to get a file or a bunch of files compressed as efficiently
|and effectively as possible.

I'm not  convinced. I  think that  allowing algorithm  parameters will
open up a  really disgusting can of  worms. I see no  harm in having a
library of exact tuned algorithms. Swapping between different standard
routines will be far easier than fiddling parameters (which can go out
of range etc).

If a user wants to create another  tuning, he can just fiddle with the
source code  for the algorithm until  he is happy and  then fiddle the
identity record to create a new  algorithm. The standard does not stop
people fiddling  with the  parameters of an  algorithm; it  just stops
them giving such vague objects standardized IDs.

(Another  argument is  that many  algorithms can  be implemented  more
efficiently if their parameters are statically specified).

The  user  of the  algorithm  doesn't  have  to  deal with  oodles  of
information to USE  an algorithm. Only if they  want a non-implemented
tuning.  In my  experience,  a handful  of  tunings  suffice for  most
algorithms.

|> However there are two very strong arguments for the block interface:
|>    EFFICIENCY: A stream interface requires one procedure call per byte
|>    processed.  For high speed applications this is unacceptable.
|
|Huh? UNIX's read() is a stream interface, and it sure doesn't require
|one procedure call per byte processed.

You're right. One can specify a block of bytes. Sorry, I got confused.

|>    DIRECTNESS: A stream interface will force many algorithms to become
|>    involved in unnecessary buffering.
|
|Your block interface forces *all* algorithms to become involved in
|unnecessary blocking. Again it sounds like you're concentrating on
|zero-delay modem compression. I don't think compressors should be forced
|into that model.

The interface does  not force the algorithms into  blocking, it forces
the user programs into it. This  is a different thing because the user
program is  in a better position  to decide how big  the blocks should
be.

I am wary of a stream interface because it is then impossible to place
a bound on the amount of output one could get from feeding in a single
byte of  input (during either  compression or decompression).  This is
more acceptable if the input and  output are hooked up to streams, but
if they are hooked up to memory there is a big problem.

Maybe a bound could be set as a parameter.

|> In contrast, a stream interface  can always be efficiently constructed
|> over the top  of a block interface.
|
|This is not true, especially given your requirement that the block
|compressor be memoryless.

The  interface  does  not  require   the  compression  routine  to  be
memoryless, only  that it close the  coding at the end  of each block.
Were  you referring  to  that,  or did  you  think  that the  standard
required contexts to be closed at the end of every ordinary block?

So where  does all  this leave us?  If the cost  of avoiding  a second
stream layer is low, I ma in favour of it. I think that everyone could
be accommodated if the standard is modified as follows:

   * Introduce a flush flag. This effectively makes the interface
     a stream interface.
   * Somehow place an upperbound on the output that an algorithm can
     generate for a given input(i.e. place a bound on buffering).
     This could be:
     - Statically specified by the standard (e.g. 10K).
     - A property of the algorithm (in the identity record).
     - A parameter to the interface.
     I am in favour of a low (e.g. 1K) static bound.
   * Retain the memory block input and output interface.

How does that sound? I'm still thinking about it.

Ross Williams
ross@spam.ua.oz.au