Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!spam!ross From: ross@spam.ua.oz.au (Ross Williams) Newsgroups: comp.compression Subject: 1) decompression speeds, 2) rounding. Summary: 1) decompression speeds, 2) rounding. Keywords: data compression rounding speeds Message-ID: <894@spam.ua.oz> Date: 27 Jun 91 06:47:39 GMT Sender: ross@spam.ua.oz Followup-To: comp.compression Organization: Statistics, Pure & Applied Mathematics, University of Adelaide Lines: 46 Two minor notes on data compression details: Decompression Times ------------------- When quoting speeds of decompression, is everyone agreed that the speed should be given relative to the decompressed data? That is, if I say that my decompression algorithm goes at 50K/second, does everyone agree that I should mean that it is WRITING 50K/s, not READING 50K/second. Averages -------- I recently polished my algorithm test harness on my macintosh and ran up against a problem with averaging. My test harness runs the algorithm under test on a suite of files (e.g. the calgary corpus). For each file, it records the compression performance (e.g. 3.48 bits per byte) in a floating point variable. It also prints out the variable to two decimal paces (as seems to have become standard). The problem arises when I get to the end of the corpus and want to print out the average of the compression performances. The question is: should I print out the average of the internal, highly accurate floating point numbers or should I print out the average of the printed results rounded to 2dp. I did some analysis and worked out that the "deep" average (the rounding of the average of the unrounded values) and the "shallow" average (the rounding of the average of the rounded values) can differ by up to one rounded digit unit value (Example: try 1.5 and 2.5 rounding to 1dp). Choosing the deep average means that if anyone attacks my statistics with a calculator at a later date, they might conclude that I made a mistake. Choosing the shallow average means losing information within the specified rounding range unneccessarily. This problem must arise in science all the time. Is there an accepted convention? When pressed, I ended up implementing the shallow (average the rounded results) average to avoid people thinking I made a mistake. (This implementation was made difficult by a rounding error in the printf in my C library!!!) Ross Williams ross@spam.ua.oz.au