Xref: utzoo tor.news:130 ont.uucp:372 Path: utzoo!yunexus!spectrix!clewis From: clewis@spectrix.UUCP (Chris R. Lewis) Newsgroups: tor.news,ont.uucp Subject: Re: news Keywords: news, flood, alarm Message-ID: <440@spectrix.UUCP> Date: 11 Feb 88 21:22:30 GMT Article-I.D.: spectrix.440 Posted: Thu Feb 11 16:22:30 1988 References: <8802050452.AA03771@gpu.utcs.toronto.edu> <314@yunexus.UUCP> <584@ncrcan.Toronto.NCR.COM> Reply-To: clewis@spectrix.UUCP (Chris R. Lewis) Distribution: ont Organization: Spectrix Microsystems Inc., Toronto, Ontario, Canada Lines: 100 In article <584@ncrcan.Toronto.NCR.COM> brian@ncrcan.Toronto.NCR.COM (Brian Onn) writes: >We can and are getting more disks, but that's not a solution. Is this news >explosion a passing fad? or is it to be expected again? I'm wondering whether we're getting some sort of positive feed-back loop around here someplace. There may be some idiosyncrasy with the batch throttles being used that might be causing this. Henry once suggested the same. Particularly this last flood. Eg: lsuc was merrily taking 900K to 1700K bytes per day from utzoo for a long period, it abruptly went over 3 megabytes for several days, then dropped back down to 300K. And is now starting to settle out again. Pardon for the disjointed nature of the posting - I'm not quite sure I understand all of the ramifications, so consider this "thinking aloud". Consider the following scenario: 1) you have an "incoming" throttle - if spool gets too low you stop news unpacking, or, more drastically, inhibit uucico from your feed site. 2) you have outgoing throttles - if spool gets too low, you start inhibiting the creation of batches for a downstream. 3) You're running close to the edge. Now, let us say that your outgoing batch is at the limit or close to it. (particularly, if one your downstreams is stuck and they're using all of your "headroom"). Then your delivery to other downstreams gets pretty slow, your incoming feed takes up the rest of your space, and then the incoming feed is turned way down. Things slow down a lot. If the downstream picks up again, the spool empties, your incoming feed gets turned on again and you get a huge flood. Expire helps, but perhaps not a lot if you have many downstream sites. When the huge flood comes in then you're in deep trouble - because you'll have a big jump in disk usage until things get old enough to start expiring again. Which, for example, is why lsuc, tmsoft, yunexus and ourselves were accelerating our expiry schedules during this last flood (we were doing rm -fr's at one point!) Oscillations in incoming load will lead to corresponding (and probably worse) oscillations in your disk usage. Spectrix has no real outgoing feeds, and we don't get a full feed either - still our spool area seems to "breathe" by 50% over a 3-7 day cycle. A lot of this is due to our incoming throttle slowing down the incoming feed. Lsuc's spool oscillates between about 500K (last-ditch throttles kick in) and 5Mb free spool... What might be happening with lsuc is the following: 1) a single downstream slows down (as a trigger) 2) spool fills, other batching slows down 3) incoming starts overrunning spool 4) incoming throttled down (lsuc disables uucico for as last ditch defence, first line is simply not unpacking the news which is still in spool). 5) expire cleans up some space, and/or stuck downstream starts to catch up. 6) batcher uses up space for other downstreams and batching speeds up (lsuc runs batcher far more frequently than it successfully connects to upstream). 7) *eventually* downstreams catch up and spool gets more space 8) incoming throttled up 9) delayed incoming batches causes flood and spool fills. 10) downstreams slow down due to lack of spool - we're back at step two. Doesn't take much to see that without sufficient "damping" this could be self-perpetuating. And, perhaps more importantly would induce very similar problems on both upstream and downstream neighbors. Particularly if you're throttles were set very close to the end of the disk. So far the damping is solely manual - like everybody's emergency expires. Without throttles this wouldn't be such a big problem because you wouldn't be trying to run anywhere near so close to the edge on your disks. For example, at lsuc the throttles come partially on at 1Mb free, and go to panic mode at .5Mb (as I remember how I set it up) - but as mentioned before, the system hovers at 1Mb to 5Mb free - one sneeze and the throttles kick in and possibly make the problem worse later. This probably requires a considerable amount of thought about recommended free-space, and carefully selected thresholds for per-system batch limits, outbatching spool limits and incoming spool limits. Things that I would think would help: 1) keep incoming batches, outgoing batches and unpacked articles on different file systems - this will reduce throttle interaction ("impacted spools" - "I can't get rid of any of this s**t because there isn't any room to send it!"). 2) Making sure that the queue limit for a downstream is quite small compared to your spool free average (ideally, queuelimit total for all downstreams is less than your spool free average) 3) Invoking the batcher fast enough to reasonably keep up with a downstream that is being connected to at the "desired rate". Ideally, if the downstream connects, invoke the batcher often enough that the queue never empties. Eg: if a downstream's queue limit can be transfered in one hour and 15 minutes, invoke the batcher every hour. -- Chris Lewis, Spectrix Microsystems Inc, UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis Phone: (416)-474-1955