Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!decwrl!mcnc!duke!wolves!ggw From: ggw%wolves@cs.duke.edu (Gregory G. Woodbury) Newsgroups: news.software.b Subject: Cnews - A small assist for managing complex sys files Summary: sort the togo files Message-ID: <1991Feb22.034319.9805@wolves.uucp> Date: 22 Feb 91 03:43:19 GMT Sender: ggw%wolves@cs.duke.edu Reply-To: ggw%wolves@cs.duke.edu Followup-To: news.software.b Organization: Wolves Den UNIX Lines: 52 X-Checksum-Snefru: 766b84af a090e189 773463cc 924efcc8 I have found that having specific subset feeds is an interesting situation. The sites that I feed tend to want specific groups and not others in non-simple patterns. I figured that it would not be a problem to have multiple entries in the sys file for a system that dealt with the appropriate subsets of the namespace so that it would not hit the "line length" limits that occasionally bite on very long sys entries. This generates duplicated entries in the togo file for certain crosspostings (e.g. alt.* x sci.* will duplicate if the alt groups are in a different sys entry from the sci groups.) This, I thought, would not be a terribly situation, the second copy will be dropped on the floor by the receiver. This is, indeed, what happens, no site reports duplicate items in the spool. It turns out though, that between some groups that one of the sites is getting that crossposting can account for almost 25% of the articles in the togo file! That makes it very inefficient to use with uucp batching. I did find a solution. In the $NEWSCTL/batch/batchsplit script, adding a "sort -u" before running the togo file through the splitting awk script removes the duplicates. (This is because the togo file name is the first group in the newsgroups header in all cases for crossposted articles.) I did it with these lines in batchsplit: --------- # main processing + # wolves special! + # sort the input to remove duplicates + tmp=/tmp/bsort$$ + cat $input | sort -u >$tmp + rm $input ; mv $tmp $input + # + # now, back to the regularly scheduled program + # rm -f togo.overflow togo.count awk 'BEGIN { total = 0 ; ninbatch = 0 ; bno = 1 ; limit = '$1' --------- The cost of sorting should be weighed carefully against the number of crossposts duplicated in your togo files by the sys files that you are using. If your sys files are small and simple, and don't use multiple entries for any system, then you won't need to sort at all. -- Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC UUCP: ...dukcds!wolves!ggw ...mcnc!wolves!ggw [use the maps!] Domain: ggw@cds.duke.edu ggw%wolves@mcnc.mcnc.org [The line eater is a boojum snark! ]