Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!brutus.cs.uiuc.edu!coolidge From: coolidge@brutus.cs.uiuc.edu (John Coolidge) Newsgroups: news.software.b Subject: Re: Lots of dups Summary: Tradeoffs Message-ID: <1989Oct27.023751.29741@brutus.cs.uiuc.edu> Date: 27 Oct 89 02:37:51 GMT References: <1989Oct25.164024.14894@ctr.columbia.edu> <1989Oct25.205129.16397@brutus.cs.uiuc.edu> <1989Oct26.164042.4692@utzoo.uucp> Sender: news@brutus.cs.uiuc.edu Reply-To: coolidge@cs.uiuc.edu Organization: U of Illinois, CS Dept., Systems Research Group Lines: 72 [Apologies for reposting --- first copy was cancelled due to header trouble] henry@utzoo.uucp (Henry Spencer) writes: >In article <1989Oct25.205129.16397@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes: >>The problem is that with REALLY fast feeds, even processing articles >>once a minute is not fast enough. The problem lies in nntpd accepting >>multiple copies because the queue hasn't been run yet... >The fundamental problem, however, goes even deeper. There are two inherently >conflicting desires: >1. Minimum processing latency. [...] >2. Minimum processing overhead. [...] >There is NO WAY to satisfy both of these desires simultaneously. All you >can do is strike some sort of compromise, depending on your own priorities. That's in general true. I've found a specific exception, but I think that's mainly because of the vast difference in cost of 1) using perl and not sh (at the cost of some portability, but not too much), and 2) starting just one relaynews per pass over the incoming directory, rather then one per file as the default newsrun does. In any case, my system (write each incoming article as a separate file, then feed them all down a pipe into one relaynews) seems to have done a great job of minimizing both 1 and 2. For the moment, most of my compromising and hacking have been aimed explicitly at goal (1). Goal (2) has come along as an unexpected bonus. >In particular, if you have very fast feeds and optimize for minimum overhead, >you will inevitably receive lots of articles more than once, although C News >will throw away the duplicates quite efficiently. (I don't really see that >there is cause for great alarm about efficiently-discarded duplicates.) Quite right. Duplicates mainly cost in doing some extra disk traffic (when I get duplicates, it was generally an entire batch full --- now each duplicate, like everything else, is a separate file). They don't cost very much, but I still prefer to keep them down, in part because it's a sign that our processing latency is low. >C News is generally slanted towards minimum overhead, given our observation >that B News was increasingly eating our machines alive doing one article at >a time. Thanks for the work! With all of the performance hacks I've put in, most of the speed of our system (Sun 3/160, SunOS 4.0.3) is due to C News. The overall cost of news on our machine is very low, so low that we can use it as a YP server and user machine along with running news and still have a reasonable response time for all functions. >(Incidentally, the oft-seen suggestion of running relaynews as a daemon >doesn't really help very much. An article is not really received until >it, its history-file line, and the update to the relevant active-file >line(s), are flushed out to disk. Flushing data to disk is a big part >of the setup/teardown overhead. If you do it once per article, the >overhead goes way up. If you batch the disk flushing, you're back to >having a significant window in which the article has been received but >this fact is not universally and positively known yet. The relaynews >daemon might be a net improvement, but it is *not* an escape from the >fundamental dilemma.) This I'm not so sure of (but you've probably measured it, so...). On the other hand, it's another part of why running one relaynews per article (or even batch) isn't a good idea. Relaynews costs: in opening active, history, and sys, in writing out the article, and in updating active and history. --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.