Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!pyramid!pesnta!phri!roy From: roy@phri.UUCP Newsgroups: net.news.b Subject: Re: Sorting batches helpful? Message-ID: <2376@phri.UUCP> Date: Wed, 18-Jun-86 18:17:19 EDT Article-I.D.: phri.2376 Posted: Wed Jun 18 18:17:19 1986 Date-Received: Fri, 20-Jun-86 01:05:47 EDT References: <510@mecc.UUCP> Reply-To: roy@phri.UUCP (Roy Smith) Distribution: na Organization: Public Health Research Inst. (NY, NY) Lines: 30 Keywords: sort news batch compress Summary: Sorting makes compress work (slightly) better In article <510@mecc.UUCP> sewilco@mecc.UUCP (Scot E. Wilcoxon) suggests having sendbatch sort news batches by article filenames. Scot hopes that this would lead to better disk performance on the receiving end. Why is it that somebody always scoops me on good ideas? Anyway, I've been thinking about sorted batches for the past couple of weeks, but for a different reason. We compress our outgoing news batches. The compress algorithm depends on finding repeated strings which it can collapse into one copy of the string and a pointer back to the same copy the next time it sees it (sort of; read the _Computer_ article for more details). You could think of compress as a software LRU cache. Caches perform better as locality of reference increases. So, it occurred to me, if you sort the list of filenames to be batched, you will tend to get all the articles in a single news group in the same batch. Now all those long included quotes just become grist for compress's cache. "OK", you say, "that's a nifty idea; does it really work?" Well, unfortunately, at this point, I'm going to have to waffle a bit on the answer. I did a few quick tests and the results were not encouraging. Two weeks ago, I took one day's worth of news and ran it through batch and compress, with and without sorting the filenames. The sorted batches were indeed smaller, but only about 5-10% so. Considering it doesn't take much to sort a list of a few hundred file names I guess this is a win, but not the big win I had hoped for. -- Roy Smith, {allegra,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016