Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!brutus.cs.uiuc.edu!coolidge From: coolidge@brutus.cs.uiuc.edu (John Coolidge) Newsgroups: news.software.b Subject: Re: Lots of dups Summary: Even C News isn't fast enough :-( (really cron's fault...) Keywords: cnews, nntp, dups Message-ID: <1989Oct25.205129.16397@brutus.cs.uiuc.edu> Date: 25 Oct 89 20:51:29 GMT References: <1989Oct25.164024.14894@ctr.columbia.edu> Sender: news@brutus.cs.uiuc.edu Reply-To: coolidge@cs.uiuc.edu Distribution: na Organization: U of Illinois, CS Dept., Systems Research Group Lines: 41 seth@ctr.columbia.edu (Seth Robertson) writes: >I run a Cnews machine with a few high-speed (NNTP) feeds. My problem is that >two of them have excessive (> 80%) duplication. >Here is part of one days summary file: > % % % >Host Name from to dup from to dup >======================= ======= ======= ======= ======= ======= ======= >cica 31.940 67.252 25.565 1383 2912 475 >ginosko 11.732 32.841 80.873 508 1422 2148 >gem.mps.ohio-state.edu 12.956 75.404 77.640 561 3265 1948 >uakari.primate.wisc.edu 37.460 0.000 21.262 1622 0 438 >----------------------------------------------------------------------- >Does anyone have any suggestions? Thanks. The problem is that with REALLY fast feeds, even processing articles once a minute is not fast enough. The problem lies in nntpd accepting multiple copies because the queue hasn't been run yet. There are a couple of possible solutions: have nntpd hand articles straight to relaynews (cuts out dups like mad, but really messes up performance), or find a way to make newsrun run more than once a minute. I've adopted the second solution by writing a newsrun daemon written in perl that runs about every 10 sec (alas, my code is nowhere near the point where I'd release it). There are a couple of possible solutions which require a lot of effort. One is to have nntpd log articles as received, and check against both that log and history before taking in a new article. The problem comes in telling nntpd that an article it logged has failed somewhere else (because of disk space, for instance). Alternatively, relaynews could be rebuilt as a daemon which takes articles as presented and writes them. This has a problem with communication and with handling crashes. --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.