Path: utzoo!utstat!helios.physics.utoronto.ca!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!mips!apple!limbo!taylor
From: taylor@limbo.Intuitive.Com (Dave Taylor)
Newsgroups: news.software.b
Subject: What If...I remove "/usr/lib/news/history*" ?
Message-ID: <490@limbo.Intuitive.Com>
Date: 1 Mar 90 23:14:54 GMT
Reply-To: taylor@limbo.Intuitive.Com (Dave Taylor)
Organization: Intuitive Systems, Mountain View, CA: +011 (415) 966-1151
Lines: 61

I have this continuing problem with netnews in that it takes up too
much disk space (what's new ;-) and expires take incredibly long to
run.  So I was a'thinkin'...

What if I were to remove the following files from /usr/lib/news ?

	history
	history.dir
	history.pag

As far as I am aware -- and keep in mind that the only news reader
we have installed on this site is "rn" -- the only purpose that the
files serve are to ensure that duplicate articles aren't allowed.
Am I right?  (I have heard some reference to commands in "rn" that
allow you to utilize the file for certain purposes, but we rarely
seem to use them, and could probably live without them without too
much difficulty: based on the '...' checking, it appears that "rn"
doesn't, for example, use the file to deal with ^P 'show previous
article in this discussion').

If we remove this, I assume that what I'd need to do would be to
write a new "unpack news batches" program, right?  That'd be okay;
I'm willing to do that...in fact, as far as I can tell, it isn't
too much work either; you get a batch file whose name is handed to
you, then run it through uncompress, then read through a big 'shar'
like file which contains a stream of articles, each headed with an
indication of how many lines are contained therein.  To unpack,
simply put the article into its own temp file, check its MessageID
against those already on the machine, then if unique, add it to the
files on the machine, updating the active file and the local group
specific sequence number.  

Really what I'd like to do is to write an unpacker that will 
immediately throw away articles from groups that appear/don't appear
in a file.  The goal would be to have the file generated via a modified
pexpire(1L) program to reflect JUST the groups that people are actually
actively reading on the machine.  ALL other articles would vanish 
without a trace, never to take up disk space at all!  Creating a nice
clean piece of code that is easy to understand, maintain, and modify
would be a good side-benefit, as would the incredibly faster expire 
that could be written too (like "find . -mtime +4 -exec /bin/rm"!)
(though even the need for a faster expire would greatly reduce once you 
were guaranteed that ONLY the articles you're interested in are actually
sitting on your disk)

This all really hinges around the history file, though.  Clearly, when
my expires take many many hours to run, it's because they're munging
through the slow and painful process of continually updating the DBM
history database ... (right?) ... I mean, I can run "fixactive(1L)"
and have it check *every* article in my /usr/spool/news directory in
under 2 minutes total!

	I welcome thoughts on this, either here on the net or via
	email...and if you're interested in a similar piece of software,
	please feel free to drop me a note with your requirements too.

						-- Dave Taylor
Intuitive Systems
Mountain View, California

taylor@limbo.intuitive.com    or   {uunet!}{decwrl,apple}!limbo!taylor