Xref: utzoo news.software.nntp:1099 news.software.b:6844 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!sdd.hp.com!hplabs!otter.hpl.hp.com!hpltoad!hpinddr!richv From: richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) Newsgroups: news.software.nntp,news.software.b Subject: Can nntp handle a hierarchy of spools? Message-ID: Date: 6 Feb 91 17:57:07 GMT Sender: news@hplb.hpl.hp.com (Usenet News Administrator) Organization: Hewlett-Packard, Cupertino, CA. Lines: 72 Nntp-Posting-Host: anorman.hpl.hp.com Here is a problem that is probably common to many large news-reading organizations. Scenario A: "Typical". Say you have a large population of news readers (10,000 people), a large number of news machines (100) each with some disk space (200Meg). This might typically be arranged as 100 machines running B or C news, with 100 users on each machine. Lets say 200Meg holds about 1 Month of news. Lets say that you decide that 1 Month's worth of news is not enough. You want to keep "local" groups and other groups that your organization thinks important for several years. Lets say that you think that you will need 2 Gigabytes to store one copy. Lets also assume that your organization doesn't want to buy 2 Gigabytes for each of 100 machines. Given the resorce limitation and your realization that each machine is storing pretty much the same information, you come up with a new scheme for your local news network. Scenario B: "Central". You gather all the spool disk drives from all the news machines and put them on one super colossal machine. Users read news using an nntp based news reader or log in via telnet to use a local news reader. This doesn't work out very well either. You may not find a machine fast enough to handle 10,000 users. If your organization is world-wide there may be no time when it is night everwhere so it will be imposible to take down the system to do backups. In fact if the system goes down all 10,000 users can't read their news. Here is the way I would like to see C-news/nntp work. I think it would be possible to do but I don't think it can today (could be wrong, I suppose I should look at the docs and source). Scenario C: "Ideal". Start with Scenario A. Take away 100 Meg from each machine and give it to a central machine (for a total of 10 Gigabytes). Configure the 99 "local" machines to automatically expire articles that haven't been read recently (optionally just use the current expire mechanism with parameters to keep it under 100 megs). Change the nntp daemon to list both local information and information from the central machine when asked about active newsgroups, headers, etc. When asked to retrieve an article it would get it from the central server if necessary, give it to the user and also store it in the local spool. Basically it would act as a giant cache. To the news reading programs it would look like a single central machine, but with the advantages that the central machine would be less busy, could be located on the far side of a low performance network and the local machines could continue to allow access to cached articles while the central machine is down. Additionally a larger amount of news can be made available to a greater number of people while using much less disk space. Slight variations could also be useful. If your caching algorithm has a high hit rate you might be able to get by with a much smaller spool (5 or 10 Meg). You might also want to distribute the central "machine". For instance one "central" machine might hold comp.sources.unix, a different one rec.*. Like I mentioned above, I don't think that the current c-news/nntp implementation can handle Scenario C. Several parts are missing. I don't think c-news can expire articles based on readership patterns. I also don't think that nntpd can keep track of a local spool and potentially several remote sources of articles and present them to the remote news reader as if they came from the same machine. I would be interested in hearing any comments on the above. Specifically, does this seem like a common problem for large organizations? Does this mixture of nntp, c-news and caching concepts look like a good solution to the problem? Are either the nntp or c-news authors doing any work in this area? What kinds of changes would be needed to the c-news and nntp sources to make Scenario C work? Richv richv%hpinddu.cup.hp.com@hplabs.hpl.hp.com