Xref: utzoo tor.news:130 ont.uucp:372
Path: utzoo!yunexus!spectrix!clewis
From: clewis@spectrix.UUCP (Chris R. Lewis)
Newsgroups: tor.news,ont.uucp
Subject: Re: news
Keywords: news, flood, alarm
Message-ID: <440@spectrix.UUCP>
Date: 11 Feb 88 21:22:30 GMT
Article-I.D.: spectrix.440
Posted: Thu Feb 11 16:22:30 1988
References: <8802050452.AA03771@gpu.utcs.toronto.edu> <314@yunexus.UUCP> <584@ncrcan.Toronto.NCR.COM>
Reply-To: clewis@spectrix.UUCP (Chris R. Lewis)
Distribution: ont
Organization: Spectrix Microsystems Inc., Toronto, Ontario, Canada
Lines: 100

In article <584@ncrcan.Toronto.NCR.COM> brian@ncrcan.Toronto.NCR.COM (Brian Onn) writes:
>We can and are getting more disks, but that's not a solution.  Is this news
>explosion a passing fad? or is it to be expected again?

I'm wondering whether we're getting some sort of positive feed-back loop
around here someplace.  There may be some idiosyncrasy with the batch 
throttles being used that might be causing this.  Henry once suggested the same.

Particularly this last flood.  Eg: lsuc was merrily taking 900K to 1700K
bytes per day from utzoo for a long period, it abruptly went over 3 megabytes 
for several days, then dropped back down to 300K.  And is now starting
to settle out again.

Pardon for the disjointed nature of the posting - I'm not quite sure I 
understand all of the ramifications, so consider this "thinking aloud".

Consider the following scenario:

	1) you have an "incoming" throttle - if spool gets too low you
	   stop news unpacking, or, more drastically, inhibit uucico
	   from your feed site.
	2) you have outgoing throttles - if spool gets too low, you
	   start inhibiting the creation of batches for a downstream.
	3) You're running close to the edge.

Now, let us say that your outgoing batch is at the limit or close to it.
(particularly, if one your downstreams is stuck and they're using all
of your "headroom").  Then your delivery to other downstreams gets pretty
slow, your incoming feed takes up the rest of your space, and then the
incoming feed is turned way down.  Things slow down a lot.  If the downstream
picks up again, the spool empties, your incoming feed gets turned on again
and you get a huge flood.  Expire helps, but perhaps not a lot if you
have many downstream sites.  When the huge flood comes in then you're in
deep trouble - because you'll have a big jump in disk usage until things
get old enough to start expiring again.  Which, for example, is why
lsuc, tmsoft, yunexus and ourselves were accelerating our expiry schedules
during this last flood (we were doing rm -fr's at one point!)

Oscillations in incoming load will lead to corresponding (and probably
worse) oscillations in your disk usage.  Spectrix has no real outgoing
feeds, and we don't get a full feed either - still our spool area seems 
to "breathe" by 50% over a 3-7 day cycle.  A lot of this is due to
our incoming throttle slowing down the incoming feed.

Lsuc's spool oscillates between about 500K (last-ditch throttles kick in)
and 5Mb free spool...  What might be happening with lsuc is the following:

	1) a single downstream slows down (as a trigger)
	2) spool fills, other batching slows down
	3) incoming starts overrunning spool
	4) incoming throttled down (lsuc disables uucico for as last ditch
	   defence, first line is simply not unpacking the news which
	   is still in spool).
	5) expire cleans up some space, and/or stuck downstream starts to
	   catch up.
	6) batcher uses up space for other downstreams and batching
	   speeds up (lsuc runs batcher far more frequently than it
	   successfully connects to upstream).
	7) *eventually* downstreams catch up and spool gets more space
	8) incoming throttled up
	9) delayed incoming batches causes flood and spool fills.
	10) downstreams slow down due to lack of spool - we're back
	   at step two.

Doesn't take much to see that without sufficient "damping" this could
be self-perpetuating.  And, perhaps more importantly would induce very
similar problems on both upstream and downstream neighbors.  Particularly
if you're throttles were set very close to the end of the disk.

So far the damping is solely manual - like everybody's emergency expires.

Without throttles this wouldn't be such a big problem because you wouldn't 
be trying to run anywhere near so close to the edge on your disks.  For 
example, at lsuc the throttles come partially on at 1Mb free, and go to 
panic mode at .5Mb (as I remember how I set it up) - but as mentioned before,
the system hovers at 1Mb to 5Mb free - one sneeze and the throttles kick in
and possibly make the problem worse later.

This probably requires a considerable amount of thought about recommended 
free-space, and carefully selected thresholds for per-system batch limits, 
outbatching spool limits and incoming spool limits.

Things that I would think would help:

1) keep incoming batches, outgoing batches and unpacked articles on 
   different file systems - this will reduce throttle interaction ("impacted 
   spools" - "I can't get rid of any of this s**t because there isn't any
   room to send it!").
2) Making sure that the queue limit for a downstream is quite small compared
   to your spool free average (ideally, queuelimit total for all downstreams
   is less than your spool free average)
3) Invoking the batcher fast enough to reasonably keep up with a downstream 
   that is being connected to at the "desired rate".  Ideally, if the
   downstream connects, invoke the batcher often enough that the queue never 
   empties.  Eg: if a downstream's queue limit can be transfered in
   one hour and 15 minutes, invoke the batcher every hour.
-- 
Chris Lewis, Spectrix Microsystems Inc,
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis
Phone: (416)-474-1955