Path: utzoo!attcan!uunet!ssbell!kent From: kent@ssbell.UUCP (Kent Landfield) Newsgroups: comp.sources.d Subject: Re: A few questions/comments on Rkive Keywords: long rkive archive sources USENET Message-ID: <520@ssbell.UUCP> Date: 6 Jul 89 08:15:08 GMT References: <1123@ssp15.idca.tds.philips.nl> Reply-To: kent@ssbell.UUCP (Kent Landfield) Organization: Sterling Software, FSG-IMD, Bellevue, NE. Lines: 152 In article <1123@ssp15.idca.tds.philips.nl> jos@idca.tds.PHILIPS.nl (Jos Vos) writes: >After reading quickly through the accompanying documentation of rkive, >I have the following remarks (please don't flame if an answer can be >found in the documentation - I read it *quickly*): > >- First of all, it looks GREAT. I hope to start using it soon! > Besides sources I want (have) to use it for a number of "normal" > newsgroups too. > >- It is mentioned in rkive(1) that an existing file is (by default) > not overwritten. What happens then? rkive handles it differently depending on whether the article is a REPOST or not. If rkive detects that the destination (or target file) name exists and the article is a .... NON-REPOST Article: In the event that any duplicate is encountered, rkive creates a problems directory (if necessary) as specified in the PROBLEMS line of the rkive.cf configuration file. It then stores the inbound article in the problems directory within a subdirectory that reflects the name of the newsgroup the duplicate was found in. The archive administrator(s) specified in the rkive.cf are mailed a message indicating what has occured. The original in the archive is not overwritten. The duplicate then becomes a matter of manual cleanup. REPOST Article: Depending on how the software is compiled... REPOSTS are handled in one of three ways currently. In all three methods the archive administrator is notified of the occurrance via e-mail. MV_ORIGINAL The original article is placed (moved) into a subdirectory in the problems directory named "Originals". The inbound reposted article is then placed into the archive in the correct position. (My favorite..:-)) ADD_REPOST_SUFFIX If ADD_REPOST_SUFFIX is defined, all reposts will have the string specified in the REPOST_SUFFIX define appended to the archive filename so that a repost of elm/part07 would appear in the archive as elm/part07-repost prior to any compression. (Careful with this one folks..) No Reposting Defines specified: The inbound article would be placed into the archive in the correct position only if the initial article is not in the archive. Otherwise the reposted article is placed in the problems directory as a normal duplicate article is now. > This area of problems is also indicated in the IDEAS file: a > more flexible naming scheme should be possible besides the > article number (and the other two). E.g. a format > string using % notations for time parameters (day, hour, seconds). > Also enabling the use of a user program that generates the filename > (without the directory) would be a possibility: this is the most > generic way and quite easy to implement (but not efficient...). The IDEAS file describes the need for an alternate way to archive newsgroups that do not support the auxiliary headers. This is necessary since the Article-Number method uses the "news subsystem" naming scheme. If a news system numbering was restarted from scratch or the entire archive was moved to a different machine, problems could occur due to the potential for duplicate filenames. This is *not* something that you do everyday but it is a problem that *can* be avoided. A patch is in testing right now to be released next week that has an additional method of archiving. Chronological archiving support has been added which allows articles to be archived in a format of... volumeYY/MOY/YYMMDD.II or volumeYY/YYMMDD.II where YY - two digit year, MOY - Jun, Jul etc (table configurable), MM - two digit month DD - two digit day II - daily issue number which represents the number of the article in the order of processing. example: volume89/Jul/890706.01 or volume89/890706.01 I agree a generic hook is needed for the actual storage vehicle so as to support new methods over distributed media. That is in the works although *any* and *all* ideas are welcome and encouraged... >- How it is known whether an article is already archived? > > The previous problem becomes BIG if it can only be concluded > that an article is already archived because the file exists... The test as to whether an article is already archived is done by checking if the archive file exists. I'm not sure what you mean by BIG. I have running rkive since Feburary and I have not moved my archive to another machine or restarted my News numbering once. :-) (Wait till I put up Cnews though :-)) Please remember, this archiver was initially designed as a sources archiver. I have added the Chronological method which solves the problems of restarting the news system and moving the archive that could have been a problem with Article-Number archiving. You can now archive non-sources groups just as effectively as sources groups. Well, as soon as the patch is posted next week.. :-) >- How are crosspostings handled? Currently, crosspostings are *not* handled. rkive archives the newsgroups that you specify in the rkive.cf configuration file. It blindly ignores crosspostings and worries only about the target newsgroup. What does this mean ? If you have specified that you wish to archive comp.sources.unix and comp.sources.d and the monthly informational posting goes out, you will currently get *two* copies..... This is a recognized deficiency. It needs to check to see if any of the crossposted groups are being archived as well and attempt to link the files. I say attempt since my archives here at ssbell reside on 4 different filesystems and as soon as I finish the distributed version, they will be scattered on as many machines. :-) >- Is it not possible to use rkive as a program directly > from the sys file (that is, with the article as stdin)? > Probably not (the first problem SHOULD be solved then). No. rkive is meant to run from cron and not receive the articles from stdin. To be quite honest, I never really thought about doing it that way but if I ... :-) :-) Currently, that is not in the works. > I think this is a much cleaner way of archiving the news, isn't it? > (who knows what happens with /usr/spool/news before tonight :-)) On my machines, I know... :-) >I know I could find the answer of some questions in the code, but >I didn't have time to look at that now. And besides that, much >people (?) will sooner or later have the same questions. Please, ask away! I *expected* that I would be answering questions. Better sooner than later. I have been receiving some *GREAT* ideas from the net as to ways to improve and enhance rkive's functionality. Thanks! I answer my mail so if you have not gotten an answer back, I probably didn't get it. I am planning on posting the patch to comp.sources.bugs and sending a copy as well to rich. Distributed archiving is next on my list. Also the "random software downloader" for retrieving complete packages, patches and all, is in development. Anyone want to help me name the "random software downloader" ? get is already taken and rsd sounds so bland.. :-) :-) >-- ###### Jos Vos ###### Internet jos@idca.tds.philips.nl ###### Thanks Jos! -Kent+ --- Kent Landfield UUCP: kent@ssbell Sterling Software FSG/IMD INTERNET: kent@ssbell.uu.net 1404 Ft. Crook Rd. South Phone: (402) 291-8300 Bellevue, NE. 68005-2969 FAX: (402) 291-4362