Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!zaphod.mps.ohio-state.edu!caen!ox.com!emv From: paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) Newsgroups: comp.archives Subject: [nntp] Re: Argh! Duplicates abound! Message-ID: <1990Dec5.164512.25643@ox.com> Date: 5 Dec 90 16:45:12 GMT References: <1990Nov29.081325.8461@cs.widener.edu> <1990Dec5.034523.21682@ux1.cso.uiuc.edu> Sender: emv@ox.com (Edward Vielmetti) Reply-To: paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) Followup-To: news.software.nntp Organization: University of Illinois at Urbana Lines: 46 Approved: emv@ox.com (Edward Vielmetti) X-Original-Newsgroups: news.software.nntp Archive-name: news/nntp/msgidd/1990-12-05 Archive: uxc.cso.uiuc.edu:/pub/nntp-1.5.10+.tar.Z [128.174.5.50] Original-posting-by: paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) Original-subject: Re: Argh! Duplicates abound! Reposted-by: emv@ox.com (Edward Vielmetti) brendan@cs.widener.edu (Brendan Kehoe) writes: > I'm having a strange problem and I've exhausted all of the >possibilities that I can think of. > I recently added a second feed (both are partial, but with nearly >exactly the same groups) to my newsfeed. Now everytime the second site >connects, a whole load of articles show up as duplicates (rather than >be rejected). Syd Weinstein (one of the folks giving me a feed) >suggested perhaps I'd built nntp wrong, since it may be reading the >history file wrong. I assume you are using cnews since you mention dbz. What is likely happening is that while cnews is chunking away at a batch from site #1, nntpd accepts a batch of the same articles from site #2 before the message-ids in batch #1 make it into the history file. When cnews finishes with batch #1, many of the articles in batch #2 will be rightly rejected as duplicates. This all comes about with the fast propagation rate for Internet NNTP sites. Our news machine, ux1.cso.uiuc.edu, has six outside feeds and was always falling behind during the day. When it finally caught up at night, most of the batches were discarded as duplicates. The fix was installing the msgidd daemon written by Paul Vixie of DEC. It stores in memory the last N minutes worth of message-ids received. Each nntpd communicates with it via a unix-domain socket to check whether an incoming article has been received but not yet processed. This has been a huge improvement. The msgidd code can be obtained either from the NNTP managers archive on either ucbvax or ucbarpa.berkeley.edu or in the file pub/nntp-1.5.10+.tar.Z on uxc.cso.uiuc.edu. /pbp -- Paul Pomes alias emacs='/usr/ucb/vi' -- All the EMACS you need to know. UUCP: {att,iuvax,uunet}!uiucuxc!paul Internet, BITNET: paul@uxc.cso.uiuc.edu US Mail: UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL 61801-2910 Brought to you by Super Global Mega Corp .com