Path: utzoo!attcan!lsuc!eci386!clewis
From: clewis@eci386.uucp (Chris Lewis)
Newsgroups: comp.sources.d
Subject: Re: A few questions/comments on Rkive
Keywords: long rkive archive sources USENET
Message-ID: <1989Jul13.160050.3478@eci386.uucp>
Date: 13 Jul 89 16:00:50 GMT
References: <1123@ssp15.idca.tds.philips.nl> <520@ssbell.UUCP> <1989Jul7.022708.4826@eci386.uucp> <523@ssbell.UUCP>
Reply-To: clewis@eci386.UUCP (Chris Lewis)
Organization: R. H. Lathwell Associates: Elegant Communications, Inc.
Lines: 52

In article <523@ssbell.UUCP> kent@ssbell.UUCP (Kent Landfield) writes:
>In article <1989Jul7.022708.4826@eci386.uucp> clewis@eci386.UUCP (Chris Lewis) writes:
># Actually, what might be better (from the point of view of trying to
># collect lots of articles before bothering the MAIL: people) is to parse
># a batch file.  For example, I have the following (C-news) sys file entry:
># 
>#     maps:comp.mail.maps/all:f:
># 
># Which places the file name of each article in comp.mail.maps, and I
># have a cron entry that runs a script that pulls each file name out
># and unpacks it, calls pathalias and sends mail to me.  
>
>Ok. I think I am being dumb here (it would not be a first) but I don't see 
>how this is really any different then what rkive does now. I can schedule 
>rkive to run via cron any time I wish and with as much frequency. The 
>difference is that this would get the file names from a different file/stdin 
>where as the current rkive gets the file names from the news directory 
>structure. You are still dependent on expire since the file specified 
>must still exist in both the current and this approach when it is time to 
>"rkive" the file.  Like I said, I am probably just missing the point.

Thinking more on it, the expire argument is probably bogus, but:

The main advantage is that you don't have to rummage around in the directory,
possibly parse the files, and check your database to see whether you've
already unpacked it.  You know that every single file listed in the batch
is new and you've not seen it before.  In fact, with this approach you 
*NEVER* have to have rkive reread its own databases or scan directories - 
the index files are merely logs of what things rkive's already snarfed, and
the batch file is names of files that rkive hasn't read yet.  Though, of 
course you do have to be fairly careful not to clobber things if they 
reappear, and you have to read the control file to decide what to do with 
each one.

[This discussion is probably moot because you've already implemented
a "fancy" version - what really bugs me is the map unpackers that people
write that go into the comp.mail.maps directory and runs pathalias *only*
on what's in comp.mail.maps.  Missing expired entries, getting duplicate 
copies of maps (when you don't have supercede or someone goofed), and
being unable to compress the map files.  And, chances are, running as
root and someone put a trojan into one of the maps...]

[re: MAIL: destination checking]
>... I made this a compile time decision by adding
>an ifdef around the getpwnam call.

Oops, I musta missed that somewhere.
-- 
Chris Lewis, R.H. Lathwell & Associates: Elegant Communications Inc.
UUCP: {uunet!mnetor, utcsri!utzoo}!lsuc!eci386!clewis
Phone: (416)-595-5425