Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!crdgw1!uunet!mcsun!corton!chorus!nocturne.chorus.fr!jloup
From: jloup@nocturne.chorus.fr (Jean-Loup Gailly)
Newsgroups: comp.compression
Subject: Re: Proposed data compression interface standard.
Keywords: data compression interface standard
Message-ID: <11116@chorus.fr>
Date: 21 Jun 91 10:18:20 GMT
References: <859@spam.ua.oz>
Sender: jloup@chorus.fr
Reply-To: jloup@nocturne.chorus.fr (Jean-Loup Gailly)
Organization: Chorus systemes, Saint Quentin en Yvelines, France
Lines: 72

In article <859@spam.ua.oz>, ross@spam.ua.oz.au (Ross Williams) writes:

| 2.2 Parameters
| --------------
| 2.2.1 A conforming  procedure must have a parameter  list that conveys
| no more  and no less information  than that conveyed by  the following
| "model" parameter list.
| 
|    [...]
|    INOUT memory    - A block of memory for use by the algorithm.
|    [...]

How do you deal with segmented architectures such as the 8086 and 286
which impose a small limit (such as 64K) on the size of a segment?
(There are only a few million machines still using this architecture :-)
You could say that the MEMORY parameter is in fact a small array of
pointers to distinct segments, but how does the caller chose the
size of each segment? Allocating systematically 64K except possibly for the
last segment would generally be a waste of memory. (Take an algorithm
requiring two segments of 40K each.)

Even on non-segmented architectures, it may be cumbersome to force all
memory used by the algorithm to be contiguous. The data structures used
by a compression algorithm are usually much more complex than a single
linear array, so the algorithm has to map somehow these data structures onto
this linear sequence of bytes. This may be difficult with some progamming
languages.

It is possible instead to add an INIT action which would let the compression
algorithm allocate the memory in an optimal fashion and possibly
return a failure boolean (or an Ada exception). Of course you also need
a CLOSE operation to deallocate this memory.

Another objection about the MEMORY parameter as proposed is that it is by
definition typeless. Even specific implementations in a strongly typed
programming language would have to use a general type which is not
suited to the algorithm. So the algorithm is forced to use type
conversions (casts in C, unchecked conversions in Ada) which are
generally not portable.  For example some implementations of Ada store
an array descriptor before the array data or in a separate location.
(Good implementations avoid the descriptor when the bounds are known
at compile time but this is not required by the language.) Such
descriptors must be constructed by the compiler. The implementer of
the compression algorithm has no way to magically transform a raw
sequence of bytes into a properly structured array together with its
descriptor.

If you let the algorithm allocate the data, then the standard memory
allocation routine provided by the language can be used. The resulting
pointer(s) (access values in Ada terminology) are opaque objects which
can be stored in the MEMORY parameter. There is also at this point a
necessary type conversion but it is much less troublesome. The chances
of getting back a valid pointer after the reverse conversion are much
higher. (Note that an Ada access value may be quite different from a machine
address on some implementations.) In languages supporting opaque types
such as Ada and C++ it would be preferable to get rid of all the unsafe
type conversions completely and use a different type of the MEMORY
parameter for each compression algorithm. But again this requires
the compression algorithm to export a primitive to allocate this
opaque object since the caller of the compression algorithm no longer
knows how to allocate it.

In short, I suggest that to avoid problems with segmented architectures
and/or strongly typed languages, the memory used by a compression
algorithm be allocated by the algorithm itself. The IDENTITY action
would still determine the maximum amount of memory that the algorithm
is allowed to allocate.

Jean-loup Gailly

Chorus systemes, 6 av G. Eiffel, 78182 St-Quentin-en-Yvelines-Cedex, France
email: jloup@chorus.fr    Tel: +33 (1) 30 64 82 79 Fax: +33 (1) 30 57 00 66