Path: utzoo!attcan!uunet!ssbell!kent
From: kent@ssbell.UUCP (Kent Landfield)
Newsgroups: comp.sources.d
Subject: Re: A few questions/comments on Rkive
Keywords: long rkive archive sources USENET
Message-ID: <520@ssbell.UUCP>
Date: 6 Jul 89 08:15:08 GMT
References: <1123@ssp15.idca.tds.philips.nl>
Reply-To: kent@ssbell.UUCP (Kent Landfield)
Organization: Sterling Software, FSG-IMD, Bellevue, NE.
Lines: 152

In article <1123@ssp15.idca.tds.philips.nl> jos@idca.tds.PHILIPS.nl (Jos Vos) writes:
>After reading quickly through the accompanying documentation of rkive,
>I have the following remarks (please don't flame if an answer can be
>found in the documentation - I read it *quickly*):
>
>-  First of all, it looks GREAT. I hope to start using it soon!
>   Besides sources I want (have) to use it for a number of "normal"
>   newsgroups too.
>
>-  It is mentioned in rkive(1) that an existing file is (by default)
>   not overwritten. What happens then?

rkive handles it differently depending on whether the article is a 
REPOST or not.  If rkive detects that the destination (or target file) 
name exists and the article is a ....

NON-REPOST Article:
	In the event that any duplicate is encountered, rkive creates a
	problems directory (if necessary) as specified in the PROBLEMS
	line of the rkive.cf configuration file. It then stores the 
	inbound article in the problems directory within a subdirectory
	that reflects the name of the newsgroup the duplicate was found
	in. The archive administrator(s) specified in the rkive.cf are 
	mailed a message indicating what has occured. The original in 
	the archive is not overwritten.  The duplicate then becomes a 
	matter of manual cleanup.

REPOST Article:
	Depending on how the software is compiled... REPOSTS are handled
	in one of three ways currently.	 In all three methods the archive
	administrator is notified of the occurrance via e-mail.
	
	 MV_ORIGINAL
	     The original article is placed (moved) into a subdirectory in
	     the problems directory named "Originals". The inbound reposted 
	     article is then placed into the archive in the correct position.
	     (My favorite..:-))
	
	 ADD_REPOST_SUFFIX 
	     If ADD_REPOST_SUFFIX is defined, all reposts will have the 
	     string specified in the REPOST_SUFFIX define appended to the 
	     archive filename so that a repost of elm/part07 would appear 
	     in the archive as elm/part07-repost prior to any compression.
	     (Careful with this one folks..)
	
	 No Reposting Defines specified:
	    The inbound article would be placed into the archive in the 
	    correct position only if the initial article is not in the archive.
	    Otherwise the reposted article is placed in the problems directory 
	    as a normal duplicate article is now.
	
>   This area of problems is also indicated in the IDEAS file: a
>   more flexible naming scheme should be possible besides the
>   article number (and the other two). E.g. a format
>   string using % notations for time parameters (day, hour, seconds).
>   Also enabling the use of a user program that generates the filename
>   (without the directory) would be a possibility: this is the most
>   generic way and quite easy to implement (but not efficient...).

The IDEAS file describes the need for an alternate way to archive newsgroups
that do not support the auxiliary headers. This is necessary since the
Article-Number method uses the "news subsystem" naming scheme. If a news
system numbering was restarted from scratch or the entire archive was moved
to a different machine, problems could occur due to the potential for duplicate
filenames.  This is *not* something that you do everyday but it is a problem 
that *can* be avoided.

A patch is in testing right now to be released next week that has an
additional method of archiving. Chronological archiving support has been
added which allows articles to be archived in a format of...

	volumeYY/MOY/YYMMDD.II or volumeYY/YYMMDD.II where 
		YY  - two digit year,
		MOY - Jun, Jul etc (table configurable),
		MM  - two digit month
		DD  - two digit day
		II  - daily issue number which represents the number
		      of the article in the order of processing.
example:
	volume89/Jul/890706.01 or volume89/890706.01

I agree a generic hook is needed for the actual storage vehicle so
as to support new methods over distributed media. That is in the works
although *any* and *all* ideas are welcome and encouraged...

>-  How it is known whether an article is already archived?
>
>   The previous problem becomes BIG if it can only be concluded
>   that an article is already archived because the file exists...

The test as to whether an article is already archived is done by checking 
if the archive file exists. I'm not sure what you mean by BIG. I have running 
rkive since Feburary and I have not moved my archive to another machine or 
restarted my News numbering once. :-) (Wait till I put up Cnews though :-))
Please remember, this archiver was initially designed as a sources archiver. 
I have added the Chronological method which solves the problems of restarting 
the news system and moving the archive that could have been a problem with 
Article-Number archiving.  You can now archive non-sources groups just as 
effectively as sources groups. Well, as soon as the patch is posted next 
week.. :-) 

>-  How are crosspostings handled?

Currently, crosspostings are *not* handled. rkive archives the newsgroups
that you specify in the rkive.cf configuration file.  It blindly ignores
crosspostings and worries only about the target newsgroup.  What does this
mean ? If you have specified that you wish to archive comp.sources.unix
and comp.sources.d and the monthly informational posting goes out, you
will currently get *two* copies..... This is a recognized deficiency. It needs
to check to see if any of the crossposted groups are being archived as well
and attempt to link the files. I say attempt since my archives here at ssbell
reside on 4 different filesystems and as soon as I finish the distributed
version, they will be scattered on as many machines. :-)

>-  Is it not possible to use rkive as a program directly
>   from the sys file (that is, with the article as stdin)?
>   Probably not (the first problem SHOULD be solved then).

No. rkive is meant to run from cron and not receive the articles from stdin.
To be quite honest, I never really thought about doing it that way but
if I ... :-) :-) Currently, that is not in the works.

>   I think this is a much cleaner way of archiving the news, isn't it?
>   (who knows what happens with /usr/spool/news before tonight :-))

On my machines, I know... :-)

>I know I could find the answer of some questions in the code, but
>I didn't have time to look at that now. And besides that, much
>people (?) will sooner or later have the same questions.

Please, ask away! I *expected* that I would be answering questions. Better 
sooner than later. I have been receiving some *GREAT* ideas from the net 
as to ways to improve and enhance rkive's functionality. Thanks!  I answer
my mail so if you have not gotten an answer back, I probably didn't get
it. I am planning on posting the patch to comp.sources.bugs and sending
a copy as well to rich. 

Distributed archiving is next on my list. Also the "random software downloader"
for retrieving complete packages, patches and all, is in development.  Anyone 
want to help me name the "random software downloader" ? get is already taken 
and rsd sounds so bland.. :-) :-)

>-- ######   Jos Vos   ######   Internet   jos@idca.tds.philips.nl   ######

			Thanks Jos!
				-Kent+
---
Kent Landfield               UUCP:     kent@ssbell
Sterling Software FSG/IMD    INTERNET: kent@ssbell.uu.net
1404 Ft. Crook Rd. South     Phone:    (402) 291-8300 
Bellevue, NE. 68005-2969     FAX:      (402) 291-4362