Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!umnd-cs!umn-cs!sundquis From: sundquis@umn-cs.UUCP (Tom Sundquist) Newsgroups: news.software.b Subject: Re: Proposal: compressing news in the spool directories Message-ID: <1493@umn-cs.UUCP> Date: Fri, 17-Apr-87 11:38:28 EST Article-I.D.: umn-cs.1493 Posted: Fri Apr 17 11:38:28 1987 Date-Received: Sun, 19-Apr-87 13:05:48 EST References: <536@vixie.UUCP> Reply-To: sundquis@umn-cs.UUCP (Tom Sundquist) Distribution: world Organization: University of Minnesota Lines: 23 Keywords: proposal, compressed news, storage In article <536@vixie.UUCP> paul@vixie.UUCP (Paul Vixie Esq) writes: > >The hard part is the headers -- they should not be compressed because they >are examined independent of the (much longer) data quite often -- in expire, >in subject searches, etc. In my view, the headers would be better left >uncompressed. So we can either put compressed and uncompressed data in the I computed a few statistics about news articles to help determine how efficient such a compression scheme might be. The average article size on our system was about 2600 b. (This includes large archives in various ``source'' newsgroups). The average header size was about 550 b. The average compression rate of article bodies was roughly 50%. (This did not include large ``source'' articles.) Hence the net compression rate would be about ((2600 - 550) * 50% +550) / 2600 = 61% I.e. keeping headers uncompressed (necessary) results in a 10% loss in compression rate. I don't know if this is pathological to the argument but needs to be considered... Tom Sundquist sundquis@umn-cs.arpa rutgers!meccts!umn-cs!sundquis