Newsgroups: news.software.b Path: utzoo!henry From: henry@utzoo.uucp (Henry Spencer) Subject: Re: What If...I remove "/usr/lib/news/history*" ? Message-ID: <1990Mar2.044311.16160@utzoo.uucp> Organization: U of Toronto Zoology References: <490@limbo.Intuitive.Com> Date: Fri, 2 Mar 90 04:43:11 GMT In article <490@limbo.Intuitive.Com> taylor@limbo.Intuitive.Com (Dave Taylor) writes: >What if I were to remove the following files from /usr/lib/news ? > history > history.dir > history.pag >As far as I am aware -- and keep in mind that the only news reader >we have installed on this site is "rn" -- the only purpose that the >files serve are to ensure that duplicate articles aren't allowed. >Am I right? ... Nope, sorry. They have one or two other notable roles. In particular, expire relies completely on "history", and cannot function without it. Removing the .dir and .pag files will break duplicate checking and some of the fancier functions in sophisticated news readers, but otherwise shouldn't do anything awful that I can think of. I do suggest learning more about the functioning of the news software before trying such drastic tampering with its databases, though. >If we remove this, I assume that what I'd need to do would be to >write a new "unpack news batches" program, right? That'd be okay; >I'm willing to do that...in fact, as far as I can tell, it isn't >too much work either... Ha ha. That's what I told Geoff five or six years ago. I'm not sure he's forgiven me for it. What you are talking about doing is completely reinventing rnews/relaynews (depending on which news you're running), and that is *not* a small job. It's 4000+ lines of code in C News, and with news at its current volume, you'd better pay a whole lot of attention to performance when you do it, because otherwise it's easy to spend all day processing news and still not be able to keep up. >simply put the article into its own temp file, check its MessageID >against those already on the machine, then if unique... How are you going to check the MessageID against the others on the machine if you've deleted the database that keeps track of such things? That's what the history.* files are for! >Really what I'd like to do is to write an unpacker that will >immediately throw away articles from groups that appear/don't appear >in a file. The goal would be to have the file generated via a modified >pexpire(1L) program to reflect JUST the groups that people are actually >actively reading on the machine. ALL other articles would vanish >without a trace, never to take up disk space at all! You can do this with C News, by changing the fourth field of active-file lines to "x". All you need to do is the data gathering to figure out which newsgroups to do it to (which isn't as easy as it looks, e.g. if you've got users who don't always read all the newsgroups they subscribe to) and a little bit of code to modify the active file accordingly. >Creating a nice >clean piece of code that is easy to understand, maintain, and modify >would be a good side-benefit We fancy we've done this with C News. It was/is a whole lot more work than we expected. You might want to look at what we've done before you strike out to write your own. > as would the incredibly faster expire >that could be written too (like "find . -mtime +4 -exec /bin/rm"!) Sorry, find is not a particularly fast way to expire things. C News expire is faster, last time I compared timings, and it gives you much more control. >This all really hinges around the history file, though. Clearly, when >my expires take many many hours to run, it's because they're munging >through the slow and painful process of continually updating the DBM >history database ... (right?) ... I mean, I can run "fixactive(1L)" >and have it check *every* article in my /usr/spool/news directory in >under 2 minutes total! If your expires take many hours to run, you're running the old B News expire, which was and is an incredible hog. C News expire is vastly faster, even with dbm updates. Also, I think you've misunderstood fixactive -- if I've remembered correctly what it does, it only looks at each *directory* of /usr/spool/news, *not* at each article. The difference is, uh, important. > I welcome thoughts on this, either here on the net or via > email...and if you're interested in a similar piece of software, > please feel free to drop me a note with your requirements too. Dave, I'm sorry, but from this article I get a very strong impression that you simply don't know the news system well enough to understand what you'd be undertaking. It's not that easy. I speak from experience. -- MSDOS, abbrev: Maybe SomeDay | Henry Spencer at U of Toronto Zoology an Operating System. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu