Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!ittatc!dcdwest!sdcsvax!ucbvax!hplabs!oliveb!glacier!reid
From: reid@glacier.ARPA (Brian Reid)
Newsgroups: net.news.adm,net.news.b
Subject: Re: curious 2.10.3 efficiency situation
Message-ID: <5104@glacier.ARPA>
Date: Sat, 8-Mar-86 03:19:57 EST
Article-I.D.: glacier.5104
Posted: Sat Mar  8 03:19:57 1986
Date-Received: Mon, 10-Mar-86 00:17:25 EST
References: <5044@glacier.ARPA>
Reply-To: reid@glacier.UUCP (Brian Reid)
Organization: Stanford University, Computer Systems Lab
Lines: 41
Xref: watmath net.news.adm:550 net.news.b:1309


Well, I've managed to get my netnews under control, after several days of
wrestling with it. I'm worried, though, that this is just the first of many
hiccups like this, and that perhaps a Vax 750 is not a big enough computer
to be a backbone site running this news software.

The problem is that we have 2 primary feeds; each potentially feeds us about
500 articles a day. In the steady state, when we make regular contact with
both feeds, we end up with about 150 from oliveb and about 350 from decwrl.
The compress/inews pipeline is able to handle about 3 articles a minute, so
our normal load is about 3 hours per day (out of 24) devoted to incoming
news.

When we were out of action for a couple of days, the steady-state rhythym
was broken, and both oliveb and decwrl queued all of the messages for us.
They arrived and sat in UUXQT queues, and while they sat there there was no
record that we had the article, and we didn't send it out to the other feed,
and so on: a vicious cycle. The net result was that we were getting a
torrent of 1000 articles per day, with many duplicates, from our two feeds.
Since we had a backlog of 2500 articles waiting to be processed, we never
caught up; glacier was not able to process 1000 articles per day, but only
by catching up, by getting all 3500 articles processed, would the input flow
be reduced back down to 500 per day.

What I finally did to fix it was to hotwire inews/rnews to run at nice -18,
and then to replace /bin/csh with a program that said "sorry, no logins
permitted at this time". This kept users off the machine while letting
uucico through. With no other users on the machine, but with uucp running
full bore, I was able to get my backlog of 3500 messages processed in about
20 hours. Then I reset everything back to normal; we've been running this
way for 3 hours now and the UUXQT queues are normal.

I don't yet have a solution to this problem, but I predict that we are not
the last site that it will happen to. If I had not been able to kick all of
my users off the machine for a day, then I really don't know how we could
have recovered from this, other than by asking our feeds to remove
net.religion and net.politics and that sort of thing, in order to decrease
the arrival traffic.
-- 
	Brian Reid	decwrl!glacier!reid
	Stanford	reid@SU-Glacier.ARPA