Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!spam!ross From: ross@spam.ua.oz.au (Ross Williams) Newsgroups: comp.compression Subject: Reply to Dan Bernstein's criticisms of standard. Summary: Reply to Dan Bernstein's criticisms of standard. Keywords: proposed data compression interface standard criticism reply Message-ID: <873@spam.ua.oz> Date: 22 Jun 91 05:09:23 GMT Sender: ross@spam.ua.oz Followup-To: comp.compression Organization: Statistics, Pure & Applied Mathematics, University of Adelaide Lines: 101 CRIT_006: MORE ON BLOCKING -------------------------- From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Date: 21 Jun 91 20:43:38 GMT || To resolve this mess, the standard prohibits parameters. || The standard also forces the algorithm designer to specify how much memory || the algorithm will require. | |I find this completely unrealistic. | |Ross, it sounds like you're trying to specify a standard for modem |(i.e., zero-delay) compression. But most compression in the real world |is high-delay. In fact, someone compressing data for an archive, or for |news batching, is much better off with control over the parameters. That |way he can adjust them to fit his needs without swapping in an entirely |different ``standard'' routine. He doesn't want to make the blocking |explicit, or to deal with the oodles of information you've defined; he |just wants to get a file or a bunch of files compressed as efficiently |and effectively as possible. I'm not convinced. I think that allowing algorithm parameters will open up a really disgusting can of worms. I see no harm in having a library of exact tuned algorithms. Swapping between different standard routines will be far easier than fiddling parameters (which can go out of range etc). If a user wants to create another tuning, he can just fiddle with the source code for the algorithm until he is happy and then fiddle the identity record to create a new algorithm. The standard does not stop people fiddling with the parameters of an algorithm; it just stops them giving such vague objects standardized IDs. (Another argument is that many algorithms can be implemented more efficiently if their parameters are statically specified). The user of the algorithm doesn't have to deal with oodles of information to USE an algorithm. Only if they want a non-implemented tuning. In my experience, a handful of tunings suffice for most algorithms. |> However there are two very strong arguments for the block interface: |> EFFICIENCY: A stream interface requires one procedure call per byte |> processed. For high speed applications this is unacceptable. | |Huh? UNIX's read() is a stream interface, and it sure doesn't require |one procedure call per byte processed. You're right. One can specify a block of bytes. Sorry, I got confused. |> DIRECTNESS: A stream interface will force many algorithms to become |> involved in unnecessary buffering. | |Your block interface forces *all* algorithms to become involved in |unnecessary blocking. Again it sounds like you're concentrating on |zero-delay modem compression. I don't think compressors should be forced |into that model. The interface does not force the algorithms into blocking, it forces the user programs into it. This is a different thing because the user program is in a better position to decide how big the blocks should be. I am wary of a stream interface because it is then impossible to place a bound on the amount of output one could get from feeding in a single byte of input (during either compression or decompression). This is more acceptable if the input and output are hooked up to streams, but if they are hooked up to memory there is a big problem. Maybe a bound could be set as a parameter. |> In contrast, a stream interface can always be efficiently constructed |> over the top of a block interface. | |This is not true, especially given your requirement that the block |compressor be memoryless. The interface does not require the compression routine to be memoryless, only that it close the coding at the end of each block. Were you referring to that, or did you think that the standard required contexts to be closed at the end of every ordinary block? So where does all this leave us? If the cost of avoiding a second stream layer is low, I ma in favour of it. I think that everyone could be accommodated if the standard is modified as follows: * Introduce a flush flag. This effectively makes the interface a stream interface. * Somehow place an upperbound on the output that an algorithm can generate for a given input(i.e. place a bound on buffering). This could be: - Statically specified by the standard (e.g. 10K). - A property of the algorithm (in the identity record). - A parameter to the interface. I am in favour of a low (e.g. 1K) static bound. * Retain the memory block input and output interface. How does that sound? I'm still thinking about it. Ross Williams ross@spam.ua.oz.au