Xref: utzoo news.software.nntp:1099 news.software.b:6844
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!sdd.hp.com!hplabs!otter.hpl.hp.com!hpltoad!hpinddr!richv
From: richv@hpinddu.cup.hp.com (Rich Van Gaasbeck)
Newsgroups: news.software.nntp,news.software.b
Subject: Can nntp handle a hierarchy of spools?
Message-ID: <RICHV.91Feb6175707@hpinddr.cup.hp.com>
Date: 6 Feb 91 17:57:07 GMT
Sender: news@hplb.hpl.hp.com (Usenet News Administrator)
Organization: Hewlett-Packard, Cupertino, CA.
Lines: 72
Nntp-Posting-Host: anorman.hpl.hp.com

Here is a problem that is probably common to many large news-reading
organizations.  

Scenario A: "Typical".  Say you have a large population of news readers
(10,000 people), a large number of news machines (100) each with some
disk space (200Meg).  This might typically be arranged as 100 machines
running B or C news, with 100 users on each machine.  Lets say 200Meg
holds about 1 Month of news.

Lets say that you decide that 1 Month's worth of news is not enough.
You want to keep "local" groups and other groups that your
organization thinks important for several years.  Lets say that you
think that you will need 2 Gigabytes to store one copy.  Lets also
assume that your organization doesn't want to buy 2 Gigabytes for each
of 100 machines.  Given the resorce limitation and your realization
that each machine is storing pretty much the same information, you
come up with a new scheme for your local news network.

Scenario B: "Central".  You gather all the spool disk drives from all
the news machines and put them on one super colossal machine.  Users
read news using an nntp based news reader or log in via telnet to use
a local news reader.

This doesn't work out very well either.  You may not find a machine
fast enough to handle 10,000 users.  If your organization is
world-wide there may be no time when it is night everwhere so it will
be imposible to take down the system to do backups.  In fact if the
system goes down all 10,000 users can't read their news.

Here is the way I would like to see C-news/nntp work.  I think it
would be possible to do but I don't think it can today (could be
wrong, I suppose I should look at the docs and source).

Scenario C: "Ideal".  Start with Scenario A.  Take away 100 Meg from
each machine and give it to a central machine (for a total of 10
Gigabytes).  Configure the 99 "local" machines to automatically expire
articles that haven't been read recently (optionally just use the
current expire mechanism with parameters to keep it under 100 megs).
Change the nntp daemon to list both local information and information
from the central machine when asked about active newsgroups, headers,
etc.  When asked to retrieve an article it would get it from the
central server if necessary, give it to the user and also store it in
the local spool.  Basically it would act as a giant cache.  To the
news reading programs it would look like a single central machine, but
with the advantages that the central machine would be less busy, could
be located on the far side of a low performance network and the local
machines could continue to allow access to cached articles while the
central machine is down.  Additionally a larger amount of news can be
made available to a greater number of people while using much less
disk space.  Slight variations could also be useful.  If your caching
algorithm has a high hit rate you might be able to get by with a much
smaller spool (5 or 10 Meg).  You might also want to distribute the
central "machine".  For instance one "central" machine might hold
comp.sources.unix, a different one rec.*.

Like I mentioned above, I don't think that the current c-news/nntp
implementation can handle Scenario C.  Several parts are missing.  I
don't think c-news can expire articles based on readership patterns.
I also don't think that nntpd can keep track of a local spool and
potentially several remote sources of articles and present them to the
remote news reader as if they came from the same machine.

I would be interested in hearing any comments on the above.
Specifically, does this seem like a common problem for large
organizations?  Does this mixture of nntp, c-news and caching concepts
look like a good solution to the problem?  Are either the nntp or
c-news authors doing any work in this area?  What kinds of changes
would be needed to the c-news and nntp sources to make Scenario C
work?

Richv
richv%hpinddu.cup.hp.com@hplabs.hpl.hp.com