Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!spam!ross
From: ross@spam.ua.oz.au (Ross Williams)
Newsgroups: comp.compression
Subject: 1) decompression speeds, 2) rounding.
Summary: 1) decompression speeds, 2) rounding.
Keywords: data compression rounding speeds
Message-ID: <894@spam.ua.oz>
Date: 27 Jun 91 06:47:39 GMT
Sender: ross@spam.ua.oz
Followup-To: comp.compression
Organization: Statistics, Pure & Applied Mathematics, University of Adelaide
Lines: 46

Two minor notes on data compression details:

Decompression Times
-------------------
When quoting speeds of decompression, is everyone agreed that the
speed should be given relative to the decompressed data? That is, if I
say that my decompression algorithm goes at 50K/second, does everyone
agree that I should mean that it is WRITING 50K/s, not READING
50K/second.

Averages
--------
I recently polished my algorithm test harness on my macintosh and ran
up against a problem with averaging.

My test harness runs the algorithm under test on a suite of files
(e.g. the calgary corpus). For each file, it records the compression
performance (e.g. 3.48 bits per byte) in a floating point variable. It
also prints out the variable to two decimal paces (as seems to have
become standard).

The problem arises when I get to the end of the corpus and want to
print out the average of the compression performances. The question
is: should I print out the average of the internal, highly accurate
floating point numbers or should I print out the average of the
printed results rounded to 2dp.

I did some analysis and worked out that the "deep" average (the
rounding of the average of the unrounded values) and the "shallow"
average (the rounding of the average of the rounded values) can differ
by up to one rounded digit unit value (Example: try 1.5 and 2.5
rounding to 1dp).

Choosing the deep average means that if anyone attacks my statistics
with a calculator at a later date, they might conclude that I made a
mistake. Choosing the shallow average means losing information within
the specified rounding range unneccessarily.

This problem must arise in science all the time. Is there an accepted
convention? When pressed, I ended up implementing the shallow (average
the rounded results) average to avoid people thinking I made a
mistake. (This implementation was made difficult by a rounding error
in the printf in my C library!!!)

Ross Williams
ross@spam.ua.oz.au