Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!novavax!twwells!bill From: bill@twwells.com (T. William Wells) Newsgroups: alt.hackers Subject: Re: hacking comp.archives and anonymous FTP Message-ID: <1990Jan11.174845.4133@twwells.com> Date: 11 Jan 90 17:48:45 GMT References: <10614@stag.math.lsa.umich.edu> Organization: None, Ft. Lauderdale, FL Lines: 47 Approved: bill@twwells.com In article <10614@stag.math.lsa.umich.edu> emv@math.lsa.umich.edu (Edward Vielmetti) writes: : Problem one is finding new stuff as it's announced. Currently the : best solution seems to be the 'gnus' newsreader, armed with a KILL : file for every group that I care about. I run C news here. I added a line to my newsrun script, just before the call to relaynews, to run my own filter on the incoming batch. This is, right now, a tiny program to just read each message from the file and check it for some keywords I'm interested in. It mails me the message id's for anything that has the keywords. For a bigger site, I'd also modify the inews script. (Being the only poster, I certainly don't need an automatic program to catch my postings. :-) For a B news site, you'd actually have to fiddle with the inews program. Why did I do it this way? The amount of disc crunching goes way down if you process the incoming in one big batch, which is how I get it. The main problem with this is that it doesn't honor cancellations and the like, but, then again, those aren't all that reliable anyway. Now, the hard part is interpreting the postings. Having written a grammar checker recently, I'm right into techniques for doing this. I'll be giving it a whirl sometime and I'll let y'all know how it goes in picking these things out. : While that list is useful, it really doesn't tell you the right : think. Each line is an inventory of what the site has, but : that changes rapidly and arbitrarily. A better approach might : be to list every package, then all the sites that carry it; : that's straightforward enough to untangle with a little work. : You could even regularize things some, with the "primary" site : for a package or home of a list going first, then other archives : in an ordering which might be meaningful. I.e.: I've got a format for this already defined. It isn't as compact as yours, however. --- Bill { uunet | novavax | ankh } !twwells!bill bill@twwells.com