Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!spam!ross From: ross@spam.ua.oz.au (Ross Williams) Newsgroups: comp.compression Subject: Reply to Jones's criticism of standard. Summary: Reply to Jones's criticism of standard. Keywords: data compression standard interface reply defence Message-ID: <861@spam.ua.oz> Date: 21 Jun 91 04:09:31 GMT Sender: ross@spam.ua.oz Followup-To: comp.compression Organization: Statistics, Pure & Applied Mathematics, University of Adelaide Lines: 181 Thanks for your response Douglas. Here is my reply. CRIT_001: USE OF A SINGLE PROCEDURE ----------------------------------- >The idea of passing the desired action to one omnibus routine that can >both compress, expand, and perhaps wash cars is odious. It means that >the entire baggage of compression must be included for applications >which only decompress, and vis a versa. > >I have yet to see any non-artificially constructed examples where I would >want to use either the compress or expand operations at the same point >in a program, depending on some variable. Thus, it would seem quite >practical to use two separate procedures, one for compressing a block, >and one for expanding an already compressed block. The reason for using one procedure only is that as one routine it can be manipulated as a first class object in many languages. For example in C a pointer-to-function type can be defined, variables of whose type can point to a particular data compression algorithm. This will allow systems that wish to choose between a variety of algorithms to store an array of pointers to such functions and try out each algorithm in turn. If more than one function is declared for each algorithm it becomes much more messy. The point about having to include (say) all the compression baggage in a decompressor hardly matters as it is only the compression CODE that must be lugged about, not the working memory required by the compression code. As most compression algorithms will use only a very small amount of code, I don't see this as a problem. I do not view the carwash problem as serious as there are only three operations and two of them use the same set of parameters. The third operation introduces only one extra parameter. CRIT_002: BLOCK VS STREAM ------------------------- >> 2.2.1 A conforming procedure must have a parameter list that conveys >> no more and no less information than that conveyed by the following >> "model" parameter list. >> >> IN action - The action requested of the data compression procedure. >> INOUT memory - A block of memory for use by the algorithm. >> IN fresh - Is this the start of a new context block? >> IN inblock - An array of input bytes. >> OUT outblock - An array of output bytes. >> OUT identity - Record specifying attributes of the algorithm. >> > >One of the biggest questions I have about this proposal is the requirement >that the calls to the algorithm be at the block level. For many >applications, a stream oriented interface would work better, and it >would be easy to provide a standard stream-to-block adapter for those >applications which prefer a block oriented interface such as the above. I agree 100%; it took a lot of agony to arrive at the block interface. However there are two very strong arguments for the block interface: EFFICIENCY: A stream interface requires one procedure call per byte processed. For high speed applications this is unacceptable. DIRECTNESS: A stream interface will force many algorithms to become involved in unnecessary buffering. In contrast, a stream interface can always be efficiently constructed over the top of a block interface. The stream interface is then also in a better position to choose buffer sizes. >For some inherently block-oriented compression algorithms, it might be >equally useful to have a standard block-to-stream adapter. As in many >object-oriented environments, we run into the classic problem here of >which interface belongs where in the hierarchy. For some implementations >of the compression abstraction, the block oriented interface is natural; >for other implementations, the stream oriented interface will be more >natural. In either case, it will be useful to have a set of adapter >cables that allow applications written in one form to make effective >use of algorithms coded in the other form. Agreed 100%. However, I really believe that there is an asymmetry in the stream/block schemes (for the reasons outlined above). I am strongly in favour of creating a stream standard, but I think it should be constructed as a layer above the block standard. (If you want another example, look at UNIX IO :-). >What I have in mind is a stream interface that has the following components: > > start_compress() -- called to indicate the start of a new context > block. > > compress(IN byte) -- called to append a byte to the stream of data > being input to the compression algorithm. > > flush_compress() -- called to mark the end of a context block, causing > any partially accumulated compressed data to be > output. > > transmit(IN byte) -- provided by the user and called by compress to > append one byte to the stream of compressed data. > stream. > > flush_transmit() -- provided by the user and called by flush_compress. > >---- > > start_expand() -- called to indicate the start of a new context > block. > > expand(OUT byte) -- called by the user to get the next byte of the > reconstructed input stream. > > receive(OUT byte) -- provided by the user and called by expand to > get the next byte of the compressed data stream. The interface above is a nice stream interface. However, it introduces many of the problems that I was trying to avoid in my standard: * Making the user have to hand over the routines "transmit", "flush_transmit" and "receive" introduces a host of problems in many programming languages. The problems relate to: * Function types. * Function parameter list type conformance. * Functions as parameters to other functions. * Global variables and IO state. * Error handling (what happens if the IO operation in "transmit" fails? In contrast, the interface of memory is extremely simple portable, and is guaranteed not to fail. You might think that I am contradicting myself by criticising functions as first class objects above while also being in favour of having a single function interface so that it can be used in a first class manner. Not so. I am in favour of having one function in the interface so as to ALLOW functions to be easily passed about in languages that allow it. The stream suggestion above FORCES it, making things difficult in languages that do not support functions as first class objects. * This interface gives the compression algorithm the right to write however many bytes it feels like, whenever it feels like it. This makes it potentially unmanageable (see below). * By having lots of interface functions, this alternative does not easily allow algorithms to be manipulated as first class objects. >At this level, bookkeeping for MAX_EXPANSION isn't present. I agree, and this is an advantage in the stream interface where one is copying streams. However, if the stream interface is being used internally to compress from memory to memory (and the programmer has dutifully written and handed over functions to read and write from input and output memory buffers) then exactly the same problem arises except that one now has no guarantees of how much memory one should allow in the output block. On as separate note, I think that it is a healthy thing to set a limit to the expansion that a compression algorithm can yield (although this may be restrictive in experimental contexts). >Two things might need to be done to this: > > 1) for some applications, a compression or expansion context may be a > useful add-on parameter to all of these routines. This would allow > programs to deal with multiple independent streams at the same time. > > 2) End-of-stream during expansion isn't addressed here. The C approach > would be to have expand and receive return integers with the byte > in the low order bits and a value of -1 in case of end-of-stream. > The Ada approach would be to raise an end-of-stream exception. > value, in Ada, you'd raise an end-of-stream exception, in > functions. The memory block interface does not have a difficulty with end of stream as it leaves that problem to the user. ------ Once again, thanks for your criticism Douglas. I look forward to more comments from comp.compression heads. Ross Williams ross@spam.ua.oz.au