Xref: utzoo news.sysadmin:502 news.software.b:1059
Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!amdahl!oliveb!jerry
From: jerry@oliveb.olivetti.com (Jerry Aguirre)
Newsgroups: news.sysadmin,news.software.b
Subject: Re: Keeping multiple news machines in sync
Message-ID: <13255@oliveb.olivetti.com>
Date: 19 Jan 88 21:12:58 GMT
References: <6263@oberon.USC.EDU> <13176@oliveb.olivetti.com> <19044@felix.UUCP>
Reply-To: jerry@oliveb.UUCP (Jerry Aguirre)
Organization: Olivetti ATC; Cupertino, Ca
Lines: 76

In article <19044@felix.UUCP> bytebug@felix.UUCP (Roger L. Long) writes:
>Jerry apparantly failed to note that Michael stated that oberon was a 750.
>Felix, too, is a 750.  And after being spoiled by the speed of 780s and
>785s, doing anything significant on a 750 seems to take forever.  Doing
>a find on a news filesystem on a 750 while at the same time the 750 is
>receiving news via NNTP and probably doing one or two UUCP transfers is
>bound to be S-L-O-W.
Roger failed to note that oliveb is a 750...  I am intimately familiar
with the blazing speed of a 750 :-)  I just did a:

    "nice find /usr/spool/news -type f -newer filename -print"

and it took about 35 minutes (412 cpu seconds) to find 422 files.
(Searching 28 days of news, your mileage may vary.)  Of course tar is a
pig but it only has to deal with the new files, not the whole news
directory.

>Thus, an alternate suggestion, which I've been mulling about from time
>to time here at FileNet:
>- modify NNTP's ihave/sendme to use the article pathname instead of 
>  Message-ID.
I played with something like this.  I had the inews ihave/sendme send
both the article ID and pathname.  The advantage was that the receiver
could "uux !rnews sender!pathname" to fetch the article without the
delay and overhead of using "sendme".

>- modify inews (or integrate this functionality into the NNTP server) to
>  do little more than extract the Xref line from the news header and use
>  that to cross-link the article, if required, and update the lib/active
>  file.  You'd also have to update the machine name in the Xref header
>  when storing on the local machine, so that rn would recognize it.
A neat idea except articles that are not cross posted don't get an xref.
I news wouldn't be able to figure out the article number for those.  The
combination of getting the filename and checking for an xref line should
handle that though.

Perhaps even simpler would be to have a new option ("X") in the sys
file that worked like "F" but included all the cross posted file names,
not just the first link.  This would avoid the "find" because you could
just "tar" those files to the secondary server.

>Well, not quite.  Using
>
>	awk '{print $1}' </usr/lib/news/history | wc
>
>I got 31643 article-ids totaling 716225 bytes.  Given that perhaps half
>of those are already expired which wouldn't need to be stored, we could
>assume a number of around a third of a megabyte (also assuming that the
>reader had time to actually read all of the incoming news).  Perhaps a
>better figure would be from the list of articles I've read that are still
>available on the system.  I don't read a lot of news; it would appear that
>I read 486 unique articles for a total of 10429 bytes of article-ids in
>the last two weeks.  Not unreasonable, and in fact somewhat impressive,
>given that my .newsrc file is nearly that large.  So maybe not as
>unacceptable as you might think!

You are right, because one only has to keep track of what the user has
read this reduces the volume considerably.  (I probably have a larger
history file here because I expire at 28 days.)

However just keeping only the articles IDs is not enough.  First of all
you need random access to them so that as each article is scanned you
can decide whether to skip it or not.  If they were stored by newsgroup
then you could read just the IDs for that group into memory and hash on
them.  Otherwise you need the overhead of an access method.

What really changes is how one sequences through the articles.  If the
file numbers are not used then there is no information to tell where to
start or in what sequence to read.  Perhaps rnews could maintain a file
for each newsgroup with a list of current article IDs and corresponding
filenames.  This would allow opening one file and quickly checking
article IDs to find unread articles.

This would represent some pretty radical changes to the way news kept
track of articles.  Anybody for "D" news?
				Jerry Aguirre